Cisco MDS 9000 Series Configuration Manual page 153

Interface
Hide thumbs Also See for MDS 9000 Series:
Table of Contents

Advertisement

Congestion Detection, Avoidance, and Isolation
both directions. If the receiver responds with a Link Credit Reset Response (LRR), the credits are restored
and the link resumes normal operation.
If the congestion is severe, LRR may not be returned and the link fails with the LR failed due to timeout
error. Credit Loss Recovery can be initiated from either side of the link.
If MDS is the receiver of the LR (because the adjacent device initiated the credit-loss recovery), the only
way MDS can return an LRR is when the input buffers of an interface are empty. If the interface still has
frames that it had received but was unable to send the frames to the destination interface, the link fails
with the LR failed nonempty receive queue error. If LR or LRR sequence is successful, the link returns
to normal operation. Even if the link returns to its normal operation, the 1 or 1.5-second time at zero Tx
credit causes severe backwards congestion in the SAN. This backward congestion can work its way back
all the way to the source of the frames. Severs or initiators typically see that a large amount of IO errors
recorded due to the large number of timeout drops.
Frames and BB_credits are not sent with reliability. If a frame is received that is so corrupt that it cannot
be recognized, the receiver of that frame does not return a BB_credit. Or, if a frame is received intact
and the BB_credit is returned but it is corrupted in transmission or on the link, the receiver of that
BB_credit does not recognize it as a BB_credit. In both cases, a transmit credit is lost. Credit Loss
Recovery (LR or LRR) results when all the transmit credits are lost over time. BB_SCN is used to recover
credits before completely running out of credits and causing congestion. Counts of frames and credits
that are returned are periodically exchanged and if there is any discrepancy in the count then credits can
be recovered. BB_SCN is available on all ISLs and is extended to F ports from Cisco MDS NX-OS
Release 8.2(1).
When the link first initializes an LR or LRR, sequence occurs normally and does not indicate a level 3
slow drain condition.
Although, severe congestion can occur in both Fibre Channel and FCoE the Link Credit Reset (LR or
LRR) actions only apply to Fibre Channel.
• Level 2—Indicates moderate congestion that is causing frames to drop because the congestion drop
timeout threshold has reached. Each frame that is received on a port is timestamped. If the frame cannot
be transmitted to the appropriate egress port within a congestion drop threshold of a switch, the frame
is dropped to prevent excessive internal congestion in the switch. Each dropped frame is part of a SCSI
exchange and causes that exchange to fail. Servers or initiators records IO errors and terminate when the
SCSI exchange fails. Other devices that are utilizing the shared infrastructure sees large delays in their
IO completion times. The congestion drop threshold is 500 ms by default and can be set to as low as 200
ms. The congestion drop threshold can be separately set for Fibre Channel and FCoE ports.
• Level 1 and Level 1.5—Indicates that delay occurs when frames cannot be transmitted immediately out
of an egress port due to the port being without Tx buffer-to-buffer credits in Fibre Channel or in an Rx
Pause state for FCoE. The amount of delay is measured by the TxWait counter and can be calculated as
a percentage of time. For example, if a port is unable to transmit for 200 ms (not necessarily continuous)
in a 1-second interval then the TxWait congestion percentage for that 1-second interval is 20% for the
specified interval. Level 1.5 indicates a more severe level of delay and is reserved for TxWait greater
than or equal to 30%. Level 1 indicates instances when TxWait is less than 30%.
Almost always, higher levels of slow drain include the lower levels. For example, Level 3 slow drain
includes level 2, level 1.5, and level 1 because the lack of ability to transmit causes delay and the delay
causes timeout dropped frames. Longer delay causes credit loss recovery to be initiated.
The following terms are used in the document:
• Buffer-to-Buffer (BB) credits (Fibre Channel only): BB_credits are a link flow control mechanism
that is used in Fibre Channel. A Fibre Channel frame can only be transmitted if the remaining Tx
Information About SAN Congestion Caused by Slow-Drain Devices
Cisco MDS 9000 Series Interfaces Configuration Guide, Release 8.x
135

Hide quick links:

Advertisement

Table of Contents
loading

Table of Contents