technology, networking and IP telephony
Posts tagged SOFTWARE
Avaya ERS 8600 Software 7.0 Draft Release Notes
Feb 25th
You’ll recall that back in December 2009 I wrote a post about the upcoming release of 7.0 software the the Ethernet Routing Switch 8600. At that time the software was in Controlled Availability (CA) and wasn’t expected to transition to General Availability (GA) until March 2010.
It seems that Nortel/Avaya are starting to get their ducks in a row as a draft version of the release notes hit Nortel’s support website over the past few days.
Here’s one known issue that grabbed my eye while browsing the release notes;
MLT/SMLT limitation(s)
Q01971344
In the case of a full-mesh SMLT configuration between 2 Clusters running OSPF (more likely an RSMLT configuration) because of the way that MLTs work in regards to CP generated traffic, it is highly recommended that the MLT port (or ports) that form the square leg of the mesh (versus the cross connect) be placed on lowered number slot/port, than the cross connections. The reason for this is because CP generated traffic is always sent out on the lowered numbered ports en active. Using this recommendation will keep some OSPF adjancey up if all the links of the IST fail. Otherwise the switches which have a failed IST could lose complete OSPF adjancey to both switches in the other Cluster and therefore become isolated.
Cheers!
Upgrade Nortel Ethernet Routing Switch 8600 to v5.1.1.1
Dec 12th
This week I took on the task of upgrading a pair of Nortel Ethernet Routing Switch 8600s from 4.1.8.0 to 5.1.1.1 software. I used the opportunity to ‘test’ out the upgrade process and procedure for a much larger site that I will be upgrading next week.
The site that I upgraded has two ERS 8600 switches (dual 8692SF w/Mezz) running 15 VLANs, OSPF, VRRP, IST/SMLT, 15 edge ERS 5520 switches, 150 IP phones, 300 personal computers, printers, etc. It’s a fairly small site but it’s all IP telephony with a CS1000B.
The site that I will be upgrading next week has two ERS 8600 switches (dual 8692SF w/Mezz) running 80 VLANs, OSPF, VRRP, BGP, PIM-SM, IST/SMLT, 36 edge switches/stacks, 50 IP phones, 4000+ personal computers, printers, etc. This site is a much larger site with running a lot more VLANs along with BGP and PIM-SM.
With all the issues I’ve run into these past 12 months regarding the ERS 8600 switch I must admit that I was pleasantly surprised. Nothing blew up, nothing broke (knock on wood), and everything just seemed to work – there’s a surprise. I thought I would share the steps I took and the process I used. I will readily admit that I don’t have any RS blades in my environment so customers with RS blades might want to think twice before upgrading to 5.1.1.1 software (this is based on my discussions with other Nortel customers and their experience with 5.1.1.1 and RS blades).
I’ll also need to follow-up late next week and let everyone know how the larger upgrade went.
How did I do it?
Well I did it remotely but with access to the serial ports of all four CPUs (8692SFw/Mezz). I had to make a few configuration changes before I performed the upgrade, these are outlined in the release notes but I’ll touch on them here.
- Enable System Monitor (JDM, Edit Chassis -> System Flags and then look under “System Monitoring”)
- Reconfigure the SNMPv3 trap hosts with a retry value of 0 (read the release notes!)
I also took the opportunity to enable Jumbo Frame support with the ERS 8600 switch because that configuration change requires a reboot to take effect. (config sys set mtu 9600)
With those configuration changes saved I set out to copy the software up to the primary CPUs and then across to the standby CPUs. I took the extra precaution of copying all the software and configuration files from the FLASH to the PCMCIA card just in case something came up that I need them there.
I started the process by upgrading the standby CPU (8692SFw/Mezz) on B switch first. From the primary CPU on the B switch I issued the following commands;
config bootconfig choice primary image-file /flash/p80a5111.img config bootconfig choice secondary image-file /flash/p80a4180.img save bootconfig save config
With the configuration files changed and saved I connected to the standby CPU (peer telnet) and issued the command to upgrade the boot software on the standby CPU;
boot /flash/p80b5111.img
I watched the console as the CPU restarted and upgraded the boot flash;
################ 8K CPU BOOT FLASH Update ################ File obj-boot/p80b5111-mpc740.romH found in loaded image File size: 786624 bytes Number of flash sectors to be programmed: 7 WARNING: You are about to re-program your Boot Monitor FLASH image. Do NOT turn off power or press reset until this procedure is completed. Otherwise the card may be permanently damaged!!! Press <Return> to stop monitor upgrade.... Erased 7 sectors of bootflash Programmed BOOTFLASH Image Verifying new BOOTFLASH Image 786624 matches, 0 mismatches Updating Fileheader Erased 1 sectors of bootflash Fileheader update complete Verifying new Fileheader 512 matches, 0 mismatches Update complete! Press return to reboot
The CPU then started to load the 5.1.1.1 software for the first time;
Copyright (c) 2009 Nortel, Inc. CPU Slot 5: PPC 745 Map B Version: 5.1.1.1 Creation Time: Sep 30 2009, 15:13:36 Hardware Time: DEC 11 2009, 02:16:31 UTC Memory Size: 0x10000000 Start Type: cold SMART MODULAR TECH SMART 221 CF The /pcmcia device mounted successfully, but it appears to have been formatted with pre-Release5.1 file system code. Nortel recommends backing up the files from /pcmcia, and executing dos-format /pcmcia to bring the file system on the /pcmcia device to the latest ERS8600 baseline. open_file:can't open "/pcmcia/pcmboot.cfg" 0x380003 S_dosFsLib_FILE_NOT_FOUND /flash/ - Volume is OK The /flash device mounted successfully, but it appears to have been formatted with pre-Release5.1 file system code. Nortel recommends backing up the files from /flash, and executing dos-format /flash to bring the file system on the /flash device to the latest ERS8600 baseline. Loaded boot configuration from file /flash/boot.cfg Attaching network interface lo0... done. Press <Return> to stop auto-boot... Loading /flash/p80a5111.img ... 12606133 to 43734492 (43734492) Starting at 0x1000000... SMART MODULAR TECH SMART 221 CF Booting PMC280 Mezz HW. Please wait..... The BootCode address is 0xe000100 3303 . Mezz taking over console and modem..... Mezz CPU Booted successfully Initializing backplane net with anchor at 0x4100... done. Backplane anchor at 0x4100... .. Mounting /flash: .done. License File <license.dat> does not exist License File <license.dat> does not exist License File <license.dat> does not exist CPU6 [12/10/09 21:18:23] SW INFO Trial Period will expire in 60 days Ethernet Routing Switch 8600 System Software Release 5.1.1.1 Copyright (c) 1996-2009 Nortel, Inc. File does not exist Critical Log file created CPU6 [12/10/09 21:12:11] SW INFO System boot CPU6 [12/10/09 21:12:11] SW INFO ERS System Software Release 5.1.1.1 CPU6 [12/10/09 21:12:11] SW INFO Waiting for cpu in slot 5 ... 2 seconds CPU6 [12/10/09 21:12:13] SW INFO CPU card entering warm-standby mode... CPU6 [12/10/09 21:12:16] SW INFO Loading configuration from /flash/config.cfg ************************************************** * Copyright (c) 2009 Nortel, Inc. * * All Rights Reserved * * Ethernet Routing Switch 8010 * * Software Release 5.1.1.1 * **************************************************
With that the standby CPU was upgraded to 5.1.1.1 software and I was set to upgrade the primary CPU which would cause the switch to fail over to the standby CPU. When I issued the boot /flash/p80b5111.img command on the primary CPU the standby CPU (slot 5) became the master and I observed the following on the console;
CPU5 [12/10/09 21:20:23] HW INFO Stand-by CPU in slot # 5 becoming master... CPU5 [12/10/09 21:20:28] MPLS INFO All MPLS components are up and active CPU5 [12/10/09 21:20:28] HW INFO Card inserted: Slot=5 Type=8692SF CPU5 [12/10/09 21:20:29] HW INFO Card inserted: Slot=6 Type=8692SF CPU5 [12/10/09 21:20:29] SW INFO R-Module inserted: Slot=1 Type=8630GBR, waiting to bootup... CPU5 [12/10/09 21:20:29] SW INFO R-Module inserted: Slot=2 Type=8648GTR, waiting to bootup... CPU5 [12/10/09 21:20:29] SW INFO R-Module inserted: Slot=3 Type=8683XLR, waiting to bootup... CPU5 [12/10/09 21:20:29] HW INFO Initializing 8692SF in slot #5 ... CPU5 [12/10/09 21:20:31] HW INFO Initializing 8692SF in slot #6 ... CPU5 [12/10/09 21:20:37] SW INFO Slot 1: Loading /flash/p80j5111.dld CPU5 [12/10/09 21:20:37] SW INFO Slot 2: Loading /flash/p80j5111.dld CPU5 [12/10/09 21:20:49] SW INFO Slot 3: Loading /flash/p80j5111.dld CPU5 [12/10/09 21:21:16] SW INFO Slot 1: 8630GBR Initializing. Do not remove board. CPU5 [12/10/09 21:21:16] SW INFO Slot 2: 8648GTR Initializing. Do not remove board. CPU5 [12/10/09 21:21:21] SW INFO Slot 1: 8630GBR Initialization completed. CPU5 [12/10/09 21:21:21] SW INFO Slot 2: 8648GTR Initialization completed. CPU5 [12/10/09 21:21:22] SW INFO Slot 1: Restart new image version 5.1.1.1 CPU5 [12/10/09 21:21:22] SW INFO Slot 2: Restart new image version 5.1.1.1 CPU5 [12/10/09 21:21:30] SW INFO Slot 3: 8683XLR Initializing. Do not remove board. CPU5 [12/10/09 21:21:31] SW INFO Slot 1: Loading /flash/p80j5111.dld CPU5 [12/10/09 21:21:31] SW INFO Slot 2: Loading /flash/p80j5111.dld CPU5 [12/10/09 21:21:36] SW INFO Slot 3: 8683XLR Initialization completed. CPU5 [12/10/09 21:21:36] SW INFO Slot 3: Restart new image version 5.1.1.1 CPU5 [12/10/09 21:21:45] SW INFO Slot 3: Loading /flash/p80j5111.dld CPU5 [12/10/09 21:21:49] SW INFO slot 2 found NP heartbeat - R-Module is online CPU5 [12/10/09 21:21:51] SW INFO slot 1 found NP heartbeat - R-Module is online CPU5 [12/10/09 21:22:12] SW INFO slot 3 found NP heartbeat - R-Module is online CPU5 [12/10/09 21:22:12] HW INFO Initializing 8630GBR in slot #1 ... SNMP-v3 VACM configuration is currently using default parameters. These parameters should be changed for maximum security. SNMP-v3 Having more than one entry in Group-access table for the same group-name with different security levels can cause a security hole WARNING: THE ALLOWED LOG FILE SIZE HAS EXCEEDED CONFIGURATION LIMITS. THE FILE SIZE IS CURRENTLY 1071131 BYTES!!!! CPU5 [12/10/09 21:22:12] HW INFO Initializing 8648GTR in slot #2 ... ************************************************** * Copyright (c) 2009 Nortel, Inc. * * All Rights Reserved * * Ethernet Routing Switch 8010 * * Software Release 5.1.1.1 * ************************************************** Login: CPU5 [12/10/09 21:22:13] HW INFO Initializing 8683XLR in slot #3 ... CPU5 [12/10/09 21:22:14] SW INFO Loading configuration from /flash/config.cfg CPU5 [12/10/09 21:22:15] SW INFO NTP Enabled CPU5 [12/10/09 21:22:15] SW INFO The system is ready CPU5 [12/10/09 21:22:15] SNMP INFO Booted with PRIMARY boot image source - /flash/p80a5111.img CPU5 [12/10/09 21:22:17] SW INFO All the configured hosts not reachable CPU5 [12/10/09 21:22:17] SW INFO A new log file = /pcmcia/ccf00005.001 is created CPU5 [12/10/09 21:22:17] SW INFO PCMCIA card detected in Master CPU "sw-8600-ccr-a.site1.acme.org" slot 5, Chassis S/N SSPN6C0ANC CPU5 [12/10/09 21:22:17] SNMP INFO Chassis with Power Supply redundancy CPU5 [12/10/09 21:22:17] SNMP INFO Fan Up(FanId=1, OperStatus=2) CPU5 [12/10/09 21:22:17] IP INFO the VRF OSPF Md5 key file 1 does not exist CPU5 [12/10/09 21:22:17] SNMP INFO Fan Up(FanId=2, OperStatus=2) CPU5 [12/10/09 21:23:03] SNMP INFO CPU switch over, stand-by CPU becoming master CPU5 [12/10/09 21:23:03] SNMP INFO Sending Warm-Start Trap CPU5 [12/10/09 21:23:03] SNMP INFO CPU switch over, stand-by CPU in slot # 5 became master CPU5 [12/10/09 21:23:35] SNMP INFO Communication established with backup CPU
I will comment that for the time that the standby CPU was running 5.1.1.1 software and the primary CPU was still running 4.1.8.0 software the console became very sluggish and unresponsive. The CPU utilization also surged to 97%. I suspect the CPUs didn’t like trying to communicate with each other being so far apart on software releases.
I didn’t need to-do anything special while upgrading the switches to have the other switch in the cluster maintain the network. The IST link was re-established between upgrading the switches when B was running 5.1.1.1 and A was still running 4.1.8.0. I just repeated the steps above for the A switch and everything worked just fine.
I did go backup and clean up the log files, you might have noticed the warning in there about the log file being full. I didn’t reformat the /flash or /pcmcia filesystems because I wanted to the option to downgrade if necessary. I can reformat those filesystems at a later point in time if the software proves stable and reliable.
I’m impressed with 5.1.1.1 so far, let’s see how it stands the test of time.
Cheers!
Update: Thursday December 17, 2009
I completed the 5.1.1.1 upgrade of the larger site I refered to above on Wednesday morning and I’m happy to report that everything is working well. I did get an initial scare when one of the core ERS 8600 switches (running 4.1.6.3) went belly up just before I started the upgrade. I had just completed re-configuring the VRRP interfaces on both ERS 8600 switches so the VRRP IDs would be unique. While the switch was still forwarding Layer 2 traffic it stopped processing all Layer 3 traffic, wouldn’t respond to ICMP ping and the IST went down.
The upgrade itself took more than it usually does especially with the SuperMezz cards installed. Although once the switch came up everything was working fine, OSPF, BGP,FDB, ARP, IST, SMLT, PIM-SM, etc.
I’m very hopeful that 5.1.1.1 will provide some much needed stability to the ERS 8600 switch!
Cheers!
Nortel ERS 5500 Software 5.1.5 Available
Jul 30th
Nortel has released software 5.1.5 for the Nortel Ethernet Routing Switch 5500 series switches. There are no new features in the software release but a number of fixes.
- When unknown multicast no flood filter was enabled the multicast packets used by OSFP were blocked (Q02007873)
- Uplink fiber port was set to “Custom” instead of “Enabled” after code upgrade to 5.1.3 (Q02023262)
- Stack did not properly pass MIB values for Auth-Status (Q02011169)
- HTTP web-server crashed when running specific security test (Q02004709)
- Under certain conditions, when a new MLT was configured, traffic did not properly flow through both links of the MLT (Q02011420)
- Link did not come up when specific SFPs were used on (Q01966044)
- SSH login accepted any username except blank (Q02010762)
- Exception Error with Data Access Task Name “tIdt” (Q02024889)
You can find the release notes at the Nortel website.
Cheers!
Ethernet Routing Switch 8600 Software Release v4.1.8.2
Feb 9th
It’s finally official… Nortel has released v4.1.8.2 software for the Ethernet Routing Switch 8600. This latest code promises to put all the ARP/FDB issues that surfaced in the 4.1.6.x software branch to rest. It also promises to provide increased efficiencies for those running switch clusters (IST). I’ve been running 4.1.8.0 software for the past 30+ days and believe it’s a stable release that customers can finally count on. The one word of warning for everyone out there revolves around VRRP IDs, you must make sure you have unique VRRP IDs across your entire switch.
Anyone considering an upgrade should read the release notes carefully since there are a number of significant changes to the code.
You can find a copy of the release notes here but you’ll obviously need to visit the Nortel site to download the software.
Here’s an excerpt regarding the changes around SMLT/RSMLT;
New Features in This Release
SMLT/RSMLT Operational Improvements (CR Q01764193/Q01769324/Q01776485)For previous SMLT operation, bringing the SMLTs Up/Down triggered the flushing of the entire MAC/FDB belonging to the SMLTs in both the IST Core Peer Switches. Flushing of the MAC addresses then causes the dependent ARP (for IP stations) to be re-resolved. For ARP resolution, ERS 8600 re-ARPs for all the SMLT learned ARPs. This created a major MAC/ARP re-learning effort. As the records were flushed, during the relearning period the exception (learning) packets will also be continuously forwarded to the CPU, thereby increasing the CPU load. This would further slow-down the SMLT re-convergence as well as the h/w record reprogramming. Since proper traffic flow with an ERS 8600 is completely dependent on the h/w records, this prior behavior could adversely affect convergence times, especially in very large networks (8000+ MACs/ARPs), and those networks also running with many multicast streams, as multicast streams often need to be forwarded to the CPU for learning, thereby also increasing CPU load. The SMLT changes in this release improve this operation significantly, and continue to allow all previous SMLT/RSMLT topologies to be supported. SMLT Operational Improvements will affect SMLT/RSMLT behavior in that the actual SMLT/RSMLT connection links on a powered-up IST Core Switch, will take longer to become active (link status up and forwarding) than with previous versions of code. During this time period the other Peer IST Core Switch will always be continuing to forward, therefore avoiding any loss of traffic in the network for all SMLT/RSMLT based connections. The SMLT/RSMLT associated links will not become active upon a boot-up until the IST is completely up, and a ‘new MAC/ARP/IP sync’ has occurred between the two Core IST Peer Switches in a Cluster.
Users may see occasional instances where the Remote SMLT Flag is False on both Peer Switches. This is normal, if the flag clears and is then set properly (False on one side, True on the other), once the FDB age-out for that associated VLAN has occurred. This behavior has no affect on user traffic operation – no user traffic loss or
disruption will be seen under this condition.For proper network behavior Nortel recommends to operate both IST switches with either the “new” or “old” SMLT architecture. Therefore SMLT operation between IST Peer Core switches with one switching operating with pre-4.1.8.x code, and the other operating with 4.1.8.x or later code is NOT supported. Additionally users will see some new informational log messages generated around this behavior. The new messages formats are listed below, along with the various situations they will be seen with.
Case 1: Switch running SMLT is reset. Upon switch coming up the below messages are displayed irrespective of the number of SMLTs:
CPU5 [06/05/08 05:05:45] MLT INFO SMLT MAC/ARP Sync Requested: CAUTION do not take ANY action with the peer at this time
CPU5 [06/05/08 05:05:46] MLT INFO SMLT MAC /ARP Sync is Complete : Peer can now be used normallyCase 2: System is up and running but SMLT UP event (from down) has happened. One sync message is displayed for every SMLT that went down and has come up. In the following example, 2 x SMLTs went down and came up:
CPU5 [06/05/08 05:05:45] MLT INFO SMLT MAC/ARP Sync Requested: CAUTION do not take ANY action with the peer at this time
CPU5 [06/05/08 05:05:45] MLT INFO SMLT MAC/ARP Sync Requested: CAUTION do not take ANY action with the peer at this time
CPU5 [06/05/08 05:05:46] MLT INFO SMLT MAC /ARP Sync is Complete : Peer can now be used normally
CPU5 [06/05/08 05:05:46] MLT INFO SMLT MAC /ARP Sync is Complete : Peer can now be used normally
NOTE: To determine which specific SMLT IDs are affect, look for the SMLT ID down/up log messages.
Case 3: When sync fails due to difference in IST Peer software version (pre-4.1.8.x and 4.1.8.x) where one peer supports MAC/ARP sync but the other does not. Or some other potential issue, such as a mis-configuration or IST Session not coming up. The system that is reset and is requesting sync, it will keep all the ports locked down (except IST_MLT) until the IST comes up properly and sync has occurred. After 5 minutes the below Log/Error messages will be displayed:
CPU5 [05/15/08 05:28:51] MLT INFO SMLT MAC/ARP Sync Requested: CAUTION do not take ANY action with the peer at this time
< After 5 min>
CPU5 [05/15/08 05:33:55] MLT ERROR SMLT initial table sync is delayed or failed. Please check the peer switch for any partial config errors. All the ports in the switch except IST will remain locked.
NOTE: All known failover times for SMLT/RSMLT operation are now, and always have been sub-second. With this release all known fail-back or recovery times have been improved, especially for very large scaled environments to be within 3 seconds, in order to provided required redundancy for converged networks. These values are for unicast traffic only. Not all IP Multicast failover or fail-back/recovery situations can provide such times today, as many situations depend on the IPMC protocol recovery. For best IPMC recovery in SMLT/RSMLT designs, the use of static RPs for PIM-SM is recommended, with the same CLIP IP address assigned to both Core IST Peers within the Cluster, and to all switches with a full-mesh or square configuration. Failover or fail-back/recovery times for any situations that involve high-layer protocols can not always be guaranteed. Reference the Network Design Guide for your specific code release for recommendations on best practices to achieve best results. In many situations, it is abnormal corner case events for which times are extended. As well for all best results, VLACP MUST also be used. The SMLT/RSMLT improvements noted here have been optimized to function always with VLACP. Therefore for best results a pure Nortel SMLT/RSMLT design is best. We still support SMLT designs with any non-Nortel devices that support some level of link aggregation, but fail-back/recovery times can not be guaranteed.
NOTE: VLACP configuration should now use values of 500 msec short timer (or higher) and a minimum timeoutscale of 5. Lower values can be used, but should any VLACP ‘flapping’ occur, the user will need to increased one or more of the values. These timers have been proven to work for any large scaled environments (12,000 MACs), and also provide the 3 second recovery time required for converged networks (5 x 500 = 2.5 seconds). Using these values may not increase re-convergence or fail-back/recovery times, but instead guarantee these times under all extreme conditions. (CR Q01925738-01 and Q01928607) As well, users should note that if VLACP is admin disabled on one side of the link/connection, this will cause VLACP to bring the associated remote connection down, but since the remote connection will keep link up, the side with VLACP admin disabled, will now have a black-hole connection to the remote switch, which will cause a drop of all packets being sent to it. If VLACP is disabled on one side of a connection, it MUST also be disabled on remote side or else traffic loss will likely occur. The same applies to LACP configurations for 1 port MLTs as well.
NOTE: If using VRRP with SMLT, users are now HIGHLY (MUST) recommended to use unique VRIDs, especially when scaling VRRP (more than 40 instances). Use of a single VRID for all instances is supported within the standard, but when such a configuration is used in scaled SMLT designs, instability could be seen. A [better] alternative method, which allows scaling to maximum number of IP VLANs, is to use RSMLT designs instead. See Section 10 in this Readme (page 10) for additional information on how to easily move from VRRP design to RSMLT design.
NOTE: For any SMLT design, for L2 SMLT VLANs, it is now HIGHLY recommended to change the default VLAN FDB aging timer from its default value of 300 seconds, to now be 1 second higher that the system setting for the ARP aging timer. FDB timers are set on a per VLAN basis. If using the default system ARP aging time, config ip
arp aging , of 360 (minutes) than the proper value for the FDB aging timer, config vlan x fdb-entry aging-time , should be 21601 seconds, which is 360 minutes (6 hours) plus 1 second. This will have the system only use the ARP aging timer for aging, versus the FDB aging timer. This value has been shown to work very well to assure no improper SMLT learning. The use of this timer has one potential side-affect. For legacy module, this limits the system to around a maximum of 12,000 concurrent MACs; for R-mode system, the limit remains at 64K, even with timer setting. With this timer, should an edge device move, the system will still immediately re-learn and re-populate the FDB table properly, and not have to wait for the 6 hour (plus 1 second) timer to expire. No negative operational affects are known when using this timer value. For non-SMLT based VLANs the default FDB aging timer of 300 maybe used or can be changed or even also set to 21601. For this reason the default value of the FDB aging timer will remain at 300 (seconds), within all code releases.
Cheers!

RECENT COMMENTS