We generally perform software upgrades on all our routers and switches twice a year. It really helps to keep our network infrastructure current and it also helps to reduced unscheduled downtime.
Last fall we decided to skip the bi-yearly maintenance because there were just too many projects on the docket. This spring we came across a very interesting issue that we had never seen in the past. We started to notice that multiple Nortel Ethernet Switch 460/470 switches/stacks were rebooting themselves all over our network. It took us a few hours to realize that every switch that had rebooted had just eclipsed approximately 500 days of uptime. All the affected switches were running FW 18.104.22.168 with SW v3.6.4.08. The switches were literally rebooting themselves in the same order in which they had been upgraded almost 500 days earlier.
I’m currently trying to confirm with Nortel that this “bug” has been removed from the 3.7.x software release.
This was one occasion where the network was just too good for itself.
Update: Tuesday June 10, 2008
I received a formal response from Nortel today that included the following:
Analysis of the issue :-
When the BS-470 switches reaches 497 days the system time rolls over and during this period management communication will be lost. This is caused by the use of a 32 bit counter, which when it rolls back to 0, initiates an internal software synchronization to align all timers. This is only loss of IP management and not switching functionality.
This issue still open and can be fixed by rebooting the switches before reaching the 497 day mark.
When I inquired if the problem had been resolved in the v3.7.x software release I was told it had not. It would seem that a lot of folks just don’t expect switches to be running that long these days.
Update: Wednesday November 4, 2008
Last week Nortel released a technical service bulletin entitled, “Ethernet Routing Switches: SysUpTime approaching 497 days can cause the switch or stack to behave in some unexpected way“. They also released a video that documents a workaround to the problem.
Let me save you the time and effort of downloading either. Nortel solution is truely masterful; reboot the switch.
While I’ve been know to defend Nortel there’s just no defense for this. I’m completely floored at Nortel’s response.