I thought I would share this story with everyone. We had discovered an issue with Ethernet frames being maligned/corrupted between the Motorola Access Port 300 (AP300) and the Motorola Wireless (WS5100) LAN Switch.
We had a ticket open with Motorola trying to understand why a significant number of our AP300s were rebooting themselves at odd hours during the early morning. Motorola had requested that we provide network traces at the Access Point and Wireless Switch. Surprisingly Motorola came back and pointed out that the payload in some of the Ethernet frames was getting modified between the Wireless Switch and the Access Port.
The fundamental equipment involved in this problem were as follows; Nortel Ethernet Switch 460 (ES 460), Ethernet Switch 470 (ES 470), Ethernet Routing Switch 5520 (ERS 5520), Ethernet Routing Switch 8600 (ERS8600); Motorola Wireless LAN Switch 5100 (WS5100) and Access Ports 300(AP300).
The Motorola WS5100s and AP300s are physically connected over the same Layer 2 Ethernet network. The “Ethernet 1” port on the WS5100 is connected to a Virtual Local Area Network (VLAN) which provides a single broadcast domain for all AP 300s to connect to the WS5100. The “Ethernet 2” port on the WS5100 is used as a trunk interface to bridge between the WLANs (wireless) and VLANs (wired) segments. We essentially have core switches and edge switches (distribution is collapsed down into the core). The core switch can be a single ERS8600 or a pair of ERS8600s (Layer 3) connected via an IST (Inter-Switch Trunk). At the edge we generally deploy ES470(Layer 2) or ERS5520(Layer 2). We have deployed ES460s (PoE) into closets where ES470s are already present to specifically support PoE and the wireless network.
Here is a quick topology of the network with respect to the WS5100s and AP300s.
We recently started deploying the ERS5520s (in place of the ES470s) which directly support PoE allowing us to deploy one less piece of equipment at the edge and also provides one less bridge (hop) to switch through.We have been plagued by a problem that is affecting the Motorola AP300s causing them to randomly reset and re-adopt at different times of the day without warning or cause. In searching for the cause of this problem we’ve documented numerous Ethernet frames being maligned as they travel from the AP300 to the WS5100.
With respect to the examples I’m going to draw the following topology applies;
It should be noted that we do use the ES460s and ERS5520s to remark the 802.1p bits in the Ethernet frame so we can provide some measure of QoS with respect to the Nortel (Spectralink) Wireless LAN phones that we currently have deployed. In essence we mark all Ethernet packets on the “APVLAN” with a QoS level of 4 (“Gold”, BoSS-65530).
Network Trace Analysis
I will refer to the following two trace files;
“ers460side1.pcap” closet ES460 trace
“ers8600side1.pcap” core ERS8600 trace
I tried to merge up the two traces so each trace is synchronous with the other. We’ll focus on packet 3, you can see in the closet ES460 trace that bytes 15 and 16 are 0x20 and 0x12 respectively.
Looking at the other trace you can see that bytes 15 and 16 are different than in the first trace. You can see that the bits in 16 have been shifted to bytes 26.
You can again see the same problem in packet 4;
You can see it again in packets 6, 7, 10, 39, 43, 45, etc.
In the end the problem turned out to be a software/hardware issue with the Nortel Ethernet Routing Switch 8600. If DiffServ was enabled on the Ethernet port that was being mirrored, the mirrored data was somehow getting corrupted in the process of copying the packets. Once we disabled DiffServ on the Ethernet port the problem disappeared. We opened a case with Nortel but were told that it would be handled as an enhancement request, not a correction request (go figure!).
I personally no longer trust either the port mirror or packet capture facilities of the Nortel ERS 8600 and rely on physical taps so there can be no doubt or questions about the validity of the capture data.
We still have issues with our Motorola AP300s rebooting from time to time but they have been much better since Motorola released v2.1.3 software for the WS5000/WS5100s. We are currently working with Motorola to resolve issues in their v3.x software line that is causing our Nortel 2211 (Spectralink) wireless phones to occasionally reboot while idle and roaming.
Cheers!