Michael McNamara

HPE/Aruba Instant Access Points – mixing models on the same virtual controller

Michael McNamara — Tue, 02 Nov 2021 23:32:49 +0000

In the past if you wanted to mix an Aruba IAP-100 series and an Aruba IAP-200 series in the same network and virtual controller you had to make sure that both APs were running the same software/firmware revision prior to trying to pair them together. If you didn’t you’d end up with one AP becoming the virtual controller and the other one would just continually reboot trying to join the virtual controller because it was unable to upgrade itself as the software image between classes/models is different.

I recently discovered that this is no longer an issue… APs that are not managed by Airwave (AMP) will reach out to the Internet (Aruba Central? or Aruba Activate?) and upgrade themselves without issue to whatever version the virtual controller is running. And APs that are managed by Airwave will also upgrade themselves so long as the upgrade image is downloaded and installed into AMP for the APs to retrieve.

This is a really nice feature, and helps simplify break-fix issues when older APs die and need to be replaced but you don’t have any IAP-135s available. Now you can use IAP-215s or any 200 series APs and whether or not you have Airwave your AP will be upgraded to the correct software to work properly.

You can mix and match APs based on software release…. IAP-135s and IAP-215s running 6.4.x software work well together, as will IAP-215s, IAP-315s and even IAP-515s running 8.6.x software.

Cheers!

Update: Friday November 11, 2021

The is a known issue with older software releases that will break the ability to upgrade from the cloud. The AP in question needs to be on a “newer” release in order to establish an SSL session to the cloud. Additional details can be found in Aruba Support Advisory ARUBA-SA-20191219-PLVL08 titled Aruba Instant Certificate Expiry Issue.

Automation – Poor Mans Style

Michael McNamara — Fri, 16 Aug 2013 13:00:17 +0000

There has been a lot of discussion recently in networking circles surrounding automation especially in discussions about Software Defined Networking (SDN). While automation means different things to different people I would define it as any tool or solution that automates repetitive tasks (making the job easier) while making the output more consistent and ultimately the network more reliable. I’m a huge proponent of having the computer do the work, I guess that could be defined as automation.

The purpose of this post is to provide some simple examples of how you can start automating today. These are not glamorous solutions hence the poor man slogan but they should help provide some idea of what’s possible. There are plenty of open-source and commercial solutions out there, one that’s been receiving some extra press these past few months is Puppet.

In my current organization we deploy a lot of equipment and we usually do so on a very tight timetable where we have hours, not days or weeks to turn up a closet or a remote site. So our time is extremely precious but more so we can’t afford to be troubleshooting erroneous configuration errors that could easily be avoided with some simple automation. Like numerous organizations before us we too had Microsoft Word Templates and Excel macros and formulas but we almost always ran into problems with the human element of the equation.

I took a small 1Gbps CentOS Linux guest with a LAMP (Linux, Apache, MySQL, PHP) stack and started throwing together some Perl, PHP and JavaScript code. The outcome was a pretty powerful example of what’s possible without a big capital investment or some consulting company reaching their quarterly sales goal on your dime.

Here are three simple examples which are adoptions of each other, adding additional features as time allowed and the solutions matured.

Juniper SRX – VPN Branch Offices

While we were migrating our remote branch offices (31+ locations in all) to Juniper SRX Service Gateways we quickly realized we needed a more reliable solution than building the configuration by hand. We had a Microsoft Word template that had various fields marked {RED}, the field engineer would perform a search-n-replace to ultimately build the configuration. In our first few conversions we had a number of typos in the configuration that caused use to overrun our scheduled maintenance window. How can we make configuring the Juniper SRX easier for our field engineers? What about a web based portal that takes in the assorted variables and outputs a working configuration?

The solution was really quite easy and has been done by others before. The field engineer plugs in a few values and the Perl/PHP application spits back a complete configuration for both the branch office Juniper SRX 210H and the main office Juniper SRX 650. The initial version of the application required the field engineer to enter a random 128 character shared key, later versions of the application automatically generated a random shared key for use in the configuration. This approach completely eliminated any other configuration issues during the migration project and is now part of our standard process for a new greenfield site.

Avaya Ethernet Routing Switch 4850GTS-PWR+

On the heals of that migration we had a very large expansion project underway at our largest facility. The physical construction called for the installation of about 63+ Avaya Ethernet Routing Switch 4850GTS-PWR+ switches. In order to help streamline the configuration process and help eliminate configuration errors I built an adaption of the earlier application above to fit the requirements for this project. In this project I expanded the functionality of the original application by adding JavaScript code to perform client side data validation. If the field called for an IP address, then the JavaScript code would only submit the data to the server if the field passed validation. It was pretty straight forward and simple but we took the original solution and improved on it.

APC UPS/PDU Management Cards

In that same expansion project we also identified the need to streamline the configuration of the American Power Conversion (APC) UPS’s and PDUs that we were deploying throughout the infrastructure. If you’ve ever worked with them you know they can be somewhat difficult to quickly and easily configure. Our field engineers were spending on average 1 hour to configure each device and often there were inconsistencies in the configuration depending on which field engineer had performed the configuration. So we came up with a new streamlined process which allows the engineer to complete the task in about 15 minutes. The field engineer manually configures a DHCP reservation (manual DHCP) utilizing the MAC address of the management card within our Infoblox IP address management solution. Once the UPS or PDU is online and communicating with the network the field engineer plugs in a number of variables into the web browser and the Perl application will output the configuration. In this case we decided to take this solution one step further by having the Perl application actually program the configuration into the device. The Perl application will generate the configuration and then will make a FTP call to the actual asset and upload the configuration. The only thing left for the field engineer was to perform some simple tests once the task was complete, to verify that the asset was reporting, sending SNMP traps, to our management platform. And even that last step could have probably been easily automated.

My Thoughts

There are a number of frameworks that I could have used in writing these applications but I decided to keep it simple (this time around). The point here is to just provide an example of what’s possible. There are quite a few tools and solutions in the market place that already leverage SNMP, NET-CONF, XML, SOAP APIs, etc to help provide integration between systems as well as management and automation.

Wouldn’t it be great if the last application accepted the MAC address of the APC UPS/PDU and made an automated call to Infoblox and automatically created a DHCP reservation for that asset? Thereby streamlining the process even further? There’s nothing stopping me from doing that other than the time and energy it takes to code the solution and then test it appropriately.

I’m not ready right now to release the actual code but if enough people request I will work to creating sanitized copies and release the code under a GPL license.

Let me know what your doing around automation.

I recall a number of interesting posts a few years back where some folks had completely automated how they inventory and on-board their IP phones. They were using bar code scanners to collect the information from the outside of the box and then had an automated process for taking that information and creating the necessary configuration files for a zero-touch installation, including the actual node and TN information for the Avaya Communication Server 1000. That was a pretty neat example of automation in my opinion and obviously saved them a lot of time and effort.

Cheers!

Avaya Ethernet Routing Switch 4800 – Part 2

Michael McNamara — Tue, 16 Jul 2013 00:59:16 +0000

A few months ago I wrote about issues with the SNMP MIBS for the Avaya Ethernet Routing Switch 4800, unfortunately the problem didn’t stop there. Last week I finally found the time to troubleshoot a problem with one of our internal applications that provides a list of idle ports for each switch/stack. This application was written by myself back in 2003 and utilizes Perl and SNMP to query the IfInOctets MIB2 counter for each switch port. The application stores that value between runs and generates a daily report that includes a list of ports that haven’t changed in 45 days. We assume that if the port hasn’t been active in 45 days it’s idle and can be reused (un-patched in the closet).

The application was the original suspect, and since I wrote it years back I was asked to look at the problem. Whenever we add a new model of switch, be it a Cisco Nexus 2248TP or Avaya ERS 4850-GTS-PWR+ there’s usually some tweaking involved to make sure that everything works properly. That’s the price you pay by writing your own software solutions. This time around however it became clear pretty quickly that something else was wrong. Initially I was puzzled since every snmpwalk I performed on the ERS 4850 returned the proper values. It wasn’t until I crafted a command line with multiple SNMP OIDs (just like the script) that I was able to observe the problem.

The problem appears to be related to how the Avaya ERS 4850-GTS-PWR+ handles SNMP queries with multiple SNMP OIDS included in the same request. If I perform a SNMP query for each of the following OIDs in the same request I get the same incorrect ifInOctets value back for each port.

1.3.6.1.2.1.2.2.1.1.38 – ifIndex
1.3.6.1.2.1.2.2.1.10.38 – ifInOctets
1.3.6.1.2.1.2.2.1.3.38 – ifType

Notice how the value is the same for every port, although if I re-query the switch it will provide a different value for every port. In short the incorrect value breaks the application since it appears that every port is changing daily and no ports are ever becoming idle.

root@roo ~]# snmpgetnext -v2c -cpublic sw-icr3-psyc.acme.org ifIndex.1 ifInOctets.1 ifType.1
IF-MIB::ifIndex.2 = INTEGER: 2
IF-MIB::ifInOctets.2 = Counter32: 1106547808
IF-MIB::ifType.2 = INTEGER: ethernetCsmacd(6)

[root@roo ~]# snmpgetnext -v2c -cpublic sw-icr3-psyc.acme.org ifIndex.2 ifInOctets.2 ifType.2
IF-MIB::ifIndex.3 = INTEGER: 3
IF-MIB::ifInOctets.3 = Counter32: 1106547808
IF-MIB::ifType.3 = INTEGER: ethernetCsmacd(6)

[root@roo ~]# snmpgetnext -v2c -cpublic sw-icr3-psyc.acme.org ifIndex.3 ifInOctets.3 ifType.3
IF-MIB::ifIndex.4 = INTEGER: 4
IF-MIB::ifInOctets.4 = Counter32: 1106547808
IF-MIB::ifType.4 = INTEGER: ethernetCsmacd(6)

If I issue a SNMP get next for just the single OID then the switch returns the correct value;

[root@roo ~]# snmpgetnext -v2c -cpublic sw-icr3-psyc.acme.org ifInOctets.1  
IF-MIB::ifInOctets.2 = Counter32: 3903266154

[root@roo ~]# snmpgetnext -v2c -cpublic sw-icr3-psyc.acme.org ifInOctets.2
IF-MIB::ifInOctets.3 = Counter32: 2492668434

[root@roo ~]# snmpgetnext -v2c -cpublic sw-icr3-psyc.acme.org ifInOctets.3
IF-MIB::ifInOctets.4 = Counter32: 792830238

The result is the same whether I use SNMP v1 or SNMP v2c.

The script itself really isn’t concerned with precision, we actually only record the last 6 digits of the counter. If we were concerned about precision we might have to start utilizing ifHCInOctets (1.3.6.1.2.1.31.1.1.1.6) since this is a 10/100/1000Mbps switch port and the counters might wrap between polls.

I’ve only seen the problem on the Avaya ERS 4850-GTS-PWR+ switch running HW:10 FW:5.6.2.1 SW:v5.6.3.024. I have not observed this problem on any other models including the Avaya ERS 5000, 4500, 470 or 460 switches.

Avaya confirmed the presence of the bug today and will be escalating the case to design.

I’m curious if Solarwinds or other management platforms have stumbled upon this bug.

Cheers!

Update: Monday, August 26, 2013

I’ve learned that Avaya will address this bug in software release 5.6.4 which is due out anytime now. ;)

Avaya ERS 8600 Software 7.0 Draft Release Notes

Michael McNamara — Fri, 26 Feb 2010 03:00:35 +0000

You’ll recall that back in December 2009 I wrote a post about the upcoming release of 7.0 software the the Ethernet Routing Switch 8600. At that time the software was in Controlled Availability (CA) and wasn’t expected to transition to General Availability (GA) until March 2010.

It seems that Nortel/Avaya are starting to get their ducks in a row as a draft version of the release notes hit Nortel’s support website over the past few days.

Here’s one known issue that grabbed my eye while browsing the release notes;

MLT/SMLT limitation(s)

Q01971344
In the case of a full-mesh SMLT configuration between 2 Clusters running OSPF (more likely an RSMLT configuration) because of the way that MLTs work in regards to CP generated traffic, it is highly recommended that the MLT port (or ports) that form the square leg of the mesh (versus the cross connect) be placed on lowered number slot/port, than the cross connections. The reason for this is because CP generated traffic is always sent out on the lowered numbered ports en active. Using this recommendation will keep some OSPF adjancey up if all the links of the IST fail. Otherwise the switches which have a failed IST could lose complete OSPF adjancey to both switches in the other Cluster and therefore become isolated.

Cheers!

Upgrade Nortel Ethernet Routing Switch 8600 to v5.1.1.1

Michael McNamara — Sat, 12 Dec 2009 13:00:32 +0000

This week I took on the task of upgrading a pair of Nortel Ethernet Routing Switch 8600s from 4.1.8.0 to 5.1.1.1 software. I used the opportunity to ‘test’ out the upgrade process and procedure for a much larger site that I will be upgrading next week.

The site that I upgraded has two ERS 8600 switches (dual 8692SF w/Mezz) running 15 VLANs, OSPF, VRRP, IST/SMLT, 15 edge ERS 5520 switches, 150 IP phones, 300 personal computers, printers, etc. It’s a fairly small site but it’s all IP telephony with a CS1000B.

The site that I will be upgrading next week has two ERS 8600 switches (dual 8692SF w/Mezz) running 80 VLANs, OSPF, VRRP, BGP, PIM-SM, IST/SMLT, 36 edge switches/stacks, 50 IP phones, 4000+ personal computers, printers, etc. This site is a much larger site with running a lot more VLANs along with BGP and PIM-SM.

With all the issues I’ve run into these past 12 months regarding the ERS 8600 switch I must admit that I was pleasantly surprised. Nothing blew up, nothing broke (knock on wood), and everything just seemed to work – there’s a surprise. I thought I would share the steps I took and the process I used. I will readily admit that I don’t have any RS blades in my environment so customers with RS blades might want to think twice before upgrading to 5.1.1.1 software (this is based on my discussions with other Nortel customers and their experience with 5.1.1.1 and RS blades).

I’ll also need to follow-up late next week and let everyone know how the larger upgrade went.

How did I do it?

Well I did it remotely but with access to the serial ports of all four CPUs (8692SFw/Mezz). I had to make a few configuration changes before I performed the upgrade, these are outlined in the release notes but I’ll touch on them here.

– Enable System Monitor (JDM, Edit Chassis -> System Flags and then look under “System Monitoring”)
– Reconfigure the SNMPv3 trap hosts with a retry value of 0 (read the release notes!)

I also took the opportunity to enable Jumbo Frame support with the ERS 8600 switch because that configuration change requires a reboot to take effect. (config sys set mtu 9600)

With those configuration changes saved I set out to copy the software up to the primary CPUs and then across to the standby CPUs. I took the extra precaution of copying all the software and configuration files from the FLASH to the PCMCIA card just in case something came up that I need them there.

I started the process by upgrading the standby CPU (8692SFw/Mezz) on B switch first. From the primary CPU on the B switch I issued the following commands;

config bootconfig choice primary image-file /flash/p80a5111.img
config bootconfig choice secondary image-file /flash/p80a4180.img
save bootconfig
save config

With the configuration files changed and saved I connected to the standby CPU (peer telnet) and issued the command to upgrade the boot software on the standby CPU;

boot /flash/p80b5111.img

I watched the console as the CPU restarted and upgraded the boot flash;

################ 8K CPU BOOT FLASH Update ################

File obj-boot/p80b5111-mpc740.romH found in loaded image
File size: 786624 bytes
Number of flash sectors to be programmed: 7

WARNING: You are about to re-program your Boot Monitor FLASH
image.  Do NOT turn off power or press reset
until this procedure is completed.  Otherwise
the card may be permanently damaged!!!

Press  to stop monitor upgrade....

Erased 7 sectors of bootflash
Programmed BOOTFLASH Image
Verifying new BOOTFLASH Image
786624 matches, 0 mismatches

Updating Fileheader
Erased 1 sectors of bootflash
Fileheader update complete
Verifying new Fileheader
512 matches, 0 mismatches

Update complete!

Press return to reboot

The CPU then started to load the 5.1.1.1 software for the first time;

Copyright (c) 2009 Nortel, Inc.
CPU Slot 5:    PPC 745 Map B
Version:       5.1.1.1
Creation Time: Sep 30 2009, 15:13:36
Hardware Time: DEC 11 2009, 02:16:31 UTC
Memory Size:   0x10000000
Start Type:    cold
SMART MODULAR TECH SMART 221 CF

The /pcmcia device mounted successfully, but it appears
to have been formatted with pre-Release5.1 file system code.
Nortel recommends backing up the files from /pcmcia, and
executing dos-format /pcmcia to bring the file system on the
/pcmcia device to the latest ERS8600 baseline.
open_file:can't open "/pcmcia/pcmboot.cfg" 0x380003
S_dosFsLib_FILE_NOT_FOUND

/flash/  - Volume is OK

The /flash device mounted successfully, but it appears
to have been formatted with pre-Release5.1 file system code.
Nortel recommends backing up the files from /flash, and
executing dos-format /flash to bring the file system on the
/flash device to the latest ERS8600 baseline.

Loaded boot configuration from file /flash/boot.cfg
Attaching network interface lo0... done.

Press  to stop auto-boot...

Loading /flash/p80a5111.img ... 12606133 to 43734492 (43734492)
Starting at 0x1000000...

SMART MODULAR TECH SMART 221 CF

Booting PMC280 Mezz HW. Please wait.....
The BootCode address is 0xe000100 3303
.
Mezz taking over console and modem.....
Mezz CPU Booted successfully

Initializing backplane net with anchor at 0x4100... done.
Backplane anchor at 0x4100... ..
Mounting /flash: .done.
License File  does not exist
License File  does not exist
License File  does not exist
CPU6 [12/10/09 21:18:23] SW INFO Trial Period will expire in 60 days

Ethernet Routing Switch 8600  System Software Release 5.1.1.1
Copyright (c) 1996-2009 Nortel, Inc.

File does not exist
Critical Log file created

CPU6 [12/10/09 21:12:11] SW INFO System boot
CPU6 [12/10/09 21:12:11] SW INFO ERS System Software Release 5.1.1.1
CPU6 [12/10/09 21:12:11] SW INFO Waiting for cpu in slot 5 ... 2 seconds
CPU6 [12/10/09 21:12:13] SW INFO CPU card entering warm-standby mode...
CPU6 [12/10/09 21:12:16] SW INFO Loading configuration from /flash/config.cfg

**************************************************
* Copyright (c) 2009 Nortel, Inc.                *
* All Rights Reserved                            *
* Ethernet Routing Switch 8010                   *
* Software Release 5.1.1.1                       *
**************************************************

With that the standby CPU was upgraded to 5.1.1.1 software and I was set to upgrade the primary CPU which would cause the switch to fail over to the standby CPU. When I issued the boot /flash/p80b5111.img command on the primary CPU the standby CPU (slot 5) became the master and I observed the following on the console;

CPU5 [12/10/09 21:20:23] HW INFO Stand-by CPU in slot # 5 becoming master...
CPU5 [12/10/09 21:20:28] MPLS INFO All MPLS components are up and active
CPU5 [12/10/09 21:20:28] HW INFO Card inserted: Slot=5 Type=8692SF
CPU5 [12/10/09 21:20:29] HW INFO Card inserted: Slot=6 Type=8692SF
CPU5 [12/10/09 21:20:29] SW INFO R-Module inserted: Slot=1 Type=8630GBR, waiting to bootup...
CPU5 [12/10/09 21:20:29] SW INFO R-Module inserted: Slot=2 Type=8648GTR, waiting to bootup...
CPU5 [12/10/09 21:20:29] SW INFO R-Module inserted: Slot=3 Type=8683XLR, waiting to bootup...
CPU5 [12/10/09 21:20:29] HW INFO Initializing 8692SF in slot #5 ...
CPU5 [12/10/09 21:20:31] HW INFO Initializing 8692SF in slot #6 ...
CPU5 [12/10/09 21:20:37] SW INFO Slot  1: Loading /flash/p80j5111.dld
CPU5 [12/10/09 21:20:37] SW INFO Slot  2: Loading /flash/p80j5111.dld
CPU5 [12/10/09 21:20:49] SW INFO Slot  3: Loading /flash/p80j5111.dld
CPU5 [12/10/09 21:21:16] SW INFO Slot  1: 8630GBR Initializing.  Do not remove board.
CPU5 [12/10/09 21:21:16] SW INFO Slot  2: 8648GTR Initializing.  Do not remove board.
CPU5 [12/10/09 21:21:21] SW INFO Slot  1: 8630GBR Initialization completed.
CPU5 [12/10/09 21:21:21] SW INFO Slot  2: 8648GTR Initialization completed.
CPU5 [12/10/09 21:21:22] SW INFO Slot  1: Restart new image version 5.1.1.1
CPU5 [12/10/09 21:21:22] SW INFO Slot  2: Restart new image version 5.1.1.1
CPU5 [12/10/09 21:21:30] SW INFO Slot  3: 8683XLR Initializing.  Do not remove board.
CPU5 [12/10/09 21:21:31] SW INFO Slot  1: Loading /flash/p80j5111.dld
CPU5 [12/10/09 21:21:31] SW INFO Slot  2: Loading /flash/p80j5111.dld
CPU5 [12/10/09 21:21:36] SW INFO Slot  3: 8683XLR Initialization completed.
CPU5 [12/10/09 21:21:36] SW INFO Slot  3: Restart new image version 5.1.1.1
CPU5 [12/10/09 21:21:45] SW INFO Slot  3: Loading /flash/p80j5111.dld
CPU5 [12/10/09 21:21:49] SW INFO slot 2 found NP heartbeat - R-Module is online
CPU5 [12/10/09 21:21:51] SW INFO slot 1 found NP heartbeat - R-Module is online
CPU5 [12/10/09 21:22:12] SW INFO slot 3 found NP heartbeat - R-Module is online
CPU5 [12/10/09 21:22:12] HW INFO Initializing 8630GBR in slot #1 ...

SNMP-v3 VACM configuration is currently using default parameters.
These parameters should be changed for maximum security.

SNMP-v3 Having more than one entry in Group-access table for the same group-name with different security levels can cause a security hole

WARNING: THE ALLOWED LOG FILE SIZE HAS EXCEEDED CONFIGURATION LIMITS.
THE FILE SIZE IS CURRENTLY 1071131 BYTES!!!!
CPU5 [12/10/09 21:22:12] HW INFO Initializing 8648GTR in slot #2 ...

**************************************************
* Copyright (c) 2009 Nortel, Inc.                *
* All Rights Reserved                            *
* Ethernet Routing Switch 8010                   *
* Software Release 5.1.1.1                       *
**************************************************

Login:

CPU5 [12/10/09 21:22:13] HW INFO Initializing 8683XLR in slot #3 ...
CPU5 [12/10/09 21:22:14] SW INFO Loading configuration from /flash/config.cfg
CPU5 [12/10/09 21:22:15] SW INFO NTP Enabled
CPU5 [12/10/09 21:22:15] SW INFO The system is ready
CPU5 [12/10/09 21:22:15] SNMP INFO Booted with PRIMARY boot image source - /flash/p80a5111.img
CPU5 [12/10/09 21:22:17] SW INFO All the configured hosts not reachable

CPU5 [12/10/09 21:22:17] SW INFO A new log file = /pcmcia/ccf00005.001 is created

CPU5 [12/10/09 21:22:17] SW INFO PCMCIA card detected in Master CPU "sw-8600-ccr-a.site1.acme.org" slot 5, Chassis S/N SSPN6C0ANC
CPU5 [12/10/09 21:22:17] SNMP INFO Chassis with Power Supply redundancy
CPU5 [12/10/09 21:22:17] SNMP INFO Fan Up(FanId=1, OperStatus=2)
CPU5 [12/10/09 21:22:17] IP INFO the VRF OSPF Md5 key file 1 does not exist
CPU5 [12/10/09 21:22:17] SNMP INFO Fan Up(FanId=2, OperStatus=2)

CPU5 [12/10/09 21:23:03] SNMP INFO CPU switch over, stand-by CPU becoming master
CPU5 [12/10/09 21:23:03] SNMP INFO Sending Warm-Start Trap
CPU5 [12/10/09 21:23:03] SNMP INFO CPU switch over, stand-by CPU in slot # 5 became master
CPU5 [12/10/09 21:23:35] SNMP INFO Communication established with backup CPU

I will comment that for the time that the standby CPU was running 5.1.1.1 software and the primary CPU was still running 4.1.8.0 software the console became very sluggish and unresponsive. The CPU utilization also surged to 97%. I suspect the CPUs didn’t like trying to communicate with each other being so far apart on software releases.

I didn’t need to-do anything special while upgrading the switches to have the other switch in the cluster maintain the network. The IST link was re-established between upgrading the switches when B was running 5.1.1.1 and A was still running 4.1.8.0. I just repeated the steps above for the A switch and everything worked just fine.

I did go backup and clean up the log files, you might have noticed the warning in there about the log file being full. I didn’t reformat the /flash or /pcmcia filesystems because I wanted to the option to downgrade if necessary. I can reformat those filesystems at a later point in time if the software proves stable and reliable.

I’m impressed with 5.1.1.1 so far, let’s see how it stands the test of time.

Cheers!

Update: Thursday December 17, 2009

I completed the 5.1.1.1 upgrade of the larger site I refered to above on Wednesday morning and I’m happy to report that everything is working well. I did get an initial scare when one of the core ERS 8600 switches (running 4.1.6.3) went belly up just before I started the upgrade. I had just completed re-configuring the VRRP interfaces on both ERS 8600 switches so the VRRP IDs would be unique. While the switch was still forwarding Layer 2 traffic it stopped processing all Layer 3 traffic, wouldn’t respond to ICMP ping and the IST went down.

The upgrade itself took more than it usually does especially with the SuperMezz cards installed. Although once the switch came up everything was working fine, OSPF, BGP,FDB, ARP, IST, SMLT, PIM-SM, etc.

I’m very hopeful that 5.1.1.1 will provide some much needed stability to the ERS 8600 switch!

Cheers!

Nortel ERS 5500 Software 5.1.5 Available

Michael McNamara — Thu, 30 Jul 2009 23:00:30 +0000

Nortel has released software 5.1.5 for the Nortel Ethernet Routing Switch 5500 series switches. There are no new features in the software release but a number of fixes.

When unknown multicast no flood filter was enabled the multicast packets used by OSFP were blocked (Q02007873)
Uplink fiber port was set to “Custom” instead of “Enabled” after code upgrade to 5.1.3 (Q02023262)
Stack did not properly pass MIB values for Auth-Status (Q02011169)
HTTP web-server crashed when running specific security test (Q02004709)
Under certain conditions, when a new MLT was configured, traffic did not properly flow through both links of the MLT (Q02011420)
Link did not come up when specific SFPs were used on (Q01966044)
SSH login accepted any username except blank (Q02010762)
Exception Error with Data Access Task Name “tIdt” (Q02024889)

You can find the release notes at the Nortel website.

Cheers!
[ad name=”ad-articlefooter”]

Ethernet Routing Switch 8600 Software Release v4.1.8.2

Michael McNamara — Tue, 10 Feb 2009 03:00:37 +0000

It’s finally official… Nortel has released v4.1.8.2 software for the Ethernet Routing Switch 8600. This latest code promises to put all the ARP/FDB issues that surfaced in the 4.1.6.x software branch to rest. It also promises to provide increased efficiencies for those running switch clusters (IST). I’ve been running 4.1.8.0 software for the past 30+ days and believe it’s a stable release that customers can finally count on. The one word of warning for everyone out there revolves around VRRP IDs, you must make sure you have unique VRRP IDs across your entire switch.

Anyone considering an upgrade should read the release notes carefully since there are a number of significant changes to the code.

You can find a copy of the release notes here but you’ll obviously need to visit the Nortel site to download the software.

Here’s an excerpt regarding the changes around SMLT/RSMLT;

New Features in This Release
SMLT/RSMLT Operational Improvements (CR Q01764193/Q01769324/Q01776485)

For previous SMLT operation, bringing the SMLTs Up/Down triggered the flushing of the entire MAC/FDB belonging to the SMLTs in both the IST Core Peer Switches. Flushing of the MAC addresses then causes the dependent ARP (for IP stations) to be re-resolved. For ARP resolution, ERS 8600 re-ARPs for all the SMLT learned ARPs. This created a major MAC/ARP re-learning effort. As the records were flushed, during the relearning period the exception (learning) packets will also be continuously forwarded to the CPU, thereby increasing the CPU load. This would further slow-down the SMLT re-convergence as well as the h/w record reprogramming. Since proper traffic flow with an ERS 8600 is completely dependent on the h/w records, this prior behavior could adversely affect convergence times, especially in very large networks (8000+ MACs/ARPs), and those networks also running with many multicast streams, as multicast streams often need to be forwarded to the CPU for learning, thereby also increasing CPU load. The SMLT changes in this release improve this operation significantly, and continue to allow all previous SMLT/RSMLT topologies to be supported. SMLT Operational Improvements will affect SMLT/RSMLT behavior in that the actual SMLT/RSMLT connection links on a powered-up IST Core Switch, will take longer to become active (link status up and forwarding) than with previous versions of code. During this time period the other Peer IST Core Switch will always be continuing to forward, therefore avoiding any loss of traffic in the network for all SMLT/RSMLT based connections. The SMLT/RSMLT associated links will not become active upon a boot-up until the IST is completely up, and a ‘new MAC/ARP/IP sync’ has occurred between the two Core IST Peer Switches in a Cluster.

Users may see occasional instances where the Remote SMLT Flag is False on both Peer Switches. This is normal, if the flag clears and is then set properly (False on one side, True on the other), once the FDB age-out for that associated VLAN has occurred. This behavior has no affect on user traffic operation – no user traffic loss or
disruption will be seen under this condition.

For proper network behavior Nortel recommends to operate both IST switches with either the “new” or “old” SMLT architecture. Therefore SMLT operation between IST Peer Core switches with one switching operating with pre-4.1.8.x code, and the other operating with 4.1.8.x or later code is NOT supported. Additionally users will see some new informational log messages generated around this behavior. The new messages formats are listed below, along with the various situations they will be seen with.

Case 1: Switch running SMLT is reset. Upon switch coming up the below messages are displayed irrespective of the number of SMLTs:
CPU5 [06/05/08 05:05:45] MLT INFO SMLT MAC/ARP Sync Requested: CAUTION do not take ANY action with the peer at this time
CPU5 [06/05/08 05:05:46] MLT INFO SMLT MAC /ARP Sync is Complete : Peer can now be used normally

Case 2: System is up and running but SMLT UP event (from down) has happened. One sync message is displayed for every SMLT that went down and has come up. In the following example, 2 x SMLTs went down and came up:
CPU5 [06/05/08 05:05:45] MLT INFO SMLT MAC/ARP Sync Requested: CAUTION do not take ANY action with the peer at this time
CPU5 [06/05/08 05:05:45] MLT INFO SMLT MAC/ARP Sync Requested: CAUTION do not take ANY action with the peer at this time
CPU5 [06/05/08 05:05:46] MLT INFO SMLT MAC /ARP Sync is Complete : Peer can now be used normally
CPU5 [06/05/08 05:05:46] MLT INFO SMLT MAC /ARP Sync is Complete : Peer can now be used normally

NOTE: To determine which specific SMLT IDs are affect, look for the SMLT ID down/up log messages.

Case 3: When sync fails due to difference in IST Peer software version (pre-4.1.8.x and 4.1.8.x) where one peer supports MAC/ARP sync but the other does not. Or some other potential issue, such as a mis-configuration or IST Session not coming up. The system that is reset and is requesting sync, it will keep all the ports locked down (except IST_MLT) until the IST comes up properly and sync has occurred. After 5 minutes the below Log/Error messages will be displayed:
CPU5 [05/15/08 05:28:51] MLT INFO SMLT MAC/ARP Sync Requested: CAUTION do not take ANY action with the peer at this time
< After 5 min>
CPU5 [05/15/08 05:33:55] MLT ERROR SMLT initial table sync is delayed or failed. Please check the peer switch for any partial config errors. All the ports in the switch except IST will remain locked.

NOTE: All known failover times for SMLT/RSMLT operation are now, and always have been sub-second. With this release all known fail-back or recovery times have been improved, especially for very large scaled environments to be within 3 seconds, in order to provided required redundancy for converged networks. These values are for unicast traffic only. Not all IP Multicast failover or fail-back/recovery situations can provide such times today, as many situations depend on the IPMC protocol recovery. For best IPMC recovery in SMLT/RSMLT designs, the use of static RPs for PIM-SM is recommended, with the same CLIP IP address assigned to both Core IST Peers within the Cluster, and to all switches with a full-mesh or square configuration. Failover or fail-back/recovery times for any situations that involve high-layer protocols can not always be guaranteed. Reference the Network Design Guide for your specific code release for recommendations on best practices to achieve best results. In many situations, it is abnormal corner case events for which times are extended. As well for all best results, VLACP MUST also be used. The SMLT/RSMLT improvements noted here have been optimized to function always with VLACP. Therefore for best results a pure Nortel SMLT/RSMLT design is best. We still support SMLT designs with any non-Nortel devices that support some level of link aggregation, but fail-back/recovery times can not be guaranteed.

NOTE: VLACP configuration should now use values of 500 msec short timer (or higher) and a minimum timeoutscale of 5. Lower values can be used, but should any VLACP ‘flapping’ occur, the user will need to increased one or more of the values. These timers have been proven to work for any large scaled environments (12,000 MACs), and also provide the 3 second recovery time required for converged networks (5 x 500 = 2.5 seconds). Using these values may not increase re-convergence or fail-back/recovery times, but instead guarantee these times under all extreme conditions. (CR Q01925738-01 and Q01928607) As well, users should note that if VLACP is admin disabled on one side of the link/connection, this will cause VLACP to bring the associated remote connection down, but since the remote connection will keep link up, the side with VLACP admin disabled, will now have a black-hole connection to the remote switch, which will cause a drop of all packets being sent to it. If VLACP is disabled on one side of a connection, it MUST also be disabled on remote side or else traffic loss will likely occur. The same applies to LACP configurations for 1 port MLTs as well.

NOTE: If using VRRP with SMLT, users are now HIGHLY (MUST) recommended to use unique VRIDs, especially when scaling VRRP (more than 40 instances). Use of a single VRID for all instances is supported within the standard, but when such a configuration is used in scaled SMLT designs, instability could be seen. A [better] alternative method, which allows scaling to maximum number of IP VLANs, is to use RSMLT designs instead. See Section 10 in this Readme (page 10) for additional information on how to easily move from VRRP design to RSMLT design.

NOTE: For any SMLT design, for L2 SMLT VLANs, it is now HIGHLY recommended to change the default VLAN FDB aging timer from its default value of 300 seconds, to now be 1 second higher that the system setting for the ARP aging timer. FDB timers are set on a per VLAN basis. If using the default system ARP aging time, config ip
arp aging , of 360 (minutes) than the proper value for the FDB aging timer, config vlan x fdb-entry aging-time , should be 21601 seconds, which is 360 minutes (6 hours) plus 1 second. This will have the system only use the ARP aging timer for aging, versus the FDB aging timer. This value has been shown to work very well to assure no improper SMLT learning. The use of this timer has one potential side-affect. For legacy module, this limits the system to around a maximum of 12,000 concurrent MACs; for R-mode system, the limit remains at 64K, even with timer setting. With this timer, should an edge device move, the system will still immediately re-learn and re-populate the FDB table properly, and not have to wait for the 6 hour (plus 1 second) timer to expire. No negative operational affects are known when using this timer value. For non-SMLT based VLANs the default FDB aging timer of 300 maybe used or can be changed or even also set to 21601. For this reason the default value of the FDB aging timer will remain at 300 (seconds), within all code releases.

Cheers!