I’ve pulled more than a few hairs from my head troubleshooting stack (cascade) link issues when stacking multiple Nortel Ethernet Switch 460 or Ethernet Switch 470 switches together. I thought I would try to throw together a quick process for testing the cascade module and cable. I hope to make a follow-up post covering the Ethernet Routing Switch 5500 series at a later time.
Let me describe a typical scenario and then offer some ways of isolating the potential problem. You have a stack of four ES470s we’ll refer to as Unit 1, Unit 2, Unit 3 and Unit 4. We can use the picture to the left to visualize what a stack of four Ethernet Switch 470s might look like. While all the Up/Down stack lights should be green let’s just say that Unit 3 Down and Unit 4 Up is amber.
Let me just warn you that I have yet to figure how to truly identify a bad cascade module (the module that is built into the switch) from a bad cable without using either a cascade module that is know to be good and/or cascade cable that is know to be good in a process of elimination.
How you can determine if you have a bad cascade cable or cascade module?
It’s really pretty easy although it will require you to take the switch down and use the diagnostic boot code. You’ll need to cable up to the serial interface of the switch in order to run the test. When you’re ready go ahead and cold boot the switch. When you see the following, “470-24T Diagnostics 188.8.131.52” (or something similar since you may not have a 24T but a 48T) you’ll need to interrupt the boot sequence by hitting Ctrl-C (go ahead and hit it repeatedly). You should see something similar to the following;
470-24T Diagnostics 184.108.40.206 Testing main memory - PASSED >> Break Recognized - Wait.. >> Break Recognized - Wait.. Press 'a' to run Agent code Press 'c' to run Cascade external loopback test Press 'd' to Download agent code Press 'e' to display Errors Press 'i' to Initialize config/log flash Press 'p' to run POST tests Press 'r' to Receive cascade test packets Press 's' to Send cascade test packets..
Once your at this point you’ll need to take a single cascade cable and loop it between the Up and Down port of the switch your working on. This will put a physical loop between the two interfaces so we can run and external loopback test across the cascade links. When you’re ready go ahead and select “c” from the diagnostics menu.
Test 501 Stack External Loopback - FAILED NSX SXLB STAK: Stack Upstream Clock Failed. UCR=27 DCR=A7
In my case the Ethernet Switch 470 24 Port switch that I was using failed the loopback test. I then took a cascade cable that I knew to be working and repeated the test. It subsequently failed again which indicates to me that the cascade module is faulty. If you were to select “e” from the diagnostics menu you might seem something similar to the following;
System Resets = 58. Burn-In Loops = 0. Burn-In Errors = 0. Auto-Burn-In = DISABLED Diag Baud = 9600. Error Log: Bad Port Mask = 80000000 Loop Test Error Description: 50 501 STAK: Stack Secondary Rx (1) Timed Out 50 501 STAK: Stack Upstream Clock Failed. Is Cascade Cable Missing? 50 501 STAK: Stack Secondary Rx (1) Timed Out 50 501 STAK: Stack Secondary Rx (1) Timed Out 50 501 STAK: Force Stack RNGO Low Failed Test=0 GCReg=60 50 501 STAK: Force Stack RNGO Low Failed Test=0 GCReg=60 56 501 STAK: Stack Upstream Clock Failed. UCR=27 DCR=A7 56 501 STAK: Stack Upstream Clock Failed. UCR=27 DCR=A7 58 501 STAK: Stack Upstream Clock Failed. UCR=27 DCR=A7
One very important note! You can only stack switches that are running the same version of software (boot code and agent code). I believe the “Base” light will blink amber if you try to stack two switches together that are not running the same software.
You can also confirm a cascade/stacking issue remotely using Nortel’s Device Manager. Here’s a screenshot of two Ethernet Switch 470s stack together. You can see the yellow LEDs on Unit 1 Up and Unit 2 Down.
I will let you know that we’ve had our own share of cascade modules go bad over the past five years. While the cascade modules appear to be “replaceable” they really not designed to be field serviceable. If a switch fails the cascade loopback test it’s really only good for stand alone operation.