A few months ago I wrote about issues with the SNMP MIBS for the Avaya Ethernet Routing Switch 4800, unfortunately the problem didn’t stop there. Last week I finally found the time to troubleshoot a problem with one of our internal applications that provides a list of idle ports for each switch/stack. This application was written by myself back in 2003 and utilizes Perl and SNMP to query the IfInOctets MIB2 counter for each switch port. The application stores that value between runs and generates a daily report that includes a list of ports that haven’t changed in 45 days. We assume that if the port hasn’t been active in 45 days it’s idle and can be reused (un-patched in the closet).
The application was the original suspect, and since I wrote it years back I was asked to look at the problem. Whenever we add a new model of switch, be it a Cisco Nexus 2248TP or Avaya ERS 4850-GTS-PWR+ there’s usually some tweaking involved to make sure that everything works properly. That’s the price you pay by writing your own software solutions. This time around however it became clear pretty quickly that something else was wrong. Initially I was puzzled since every snmpwalk I performed on the ERS 4850 returned the proper values. It wasn’t until I crafted a command line with multiple SNMP OIDs (just like the script) that I was able to observe the problem.
The problem appears to be related to how the Avaya ERS 4850-GTS-PWR+ handles SNMP queries with multiple SNMP OIDS included in the same request. If I perform a SNMP query for each of the following OIDs in the same request I get the same incorrect ifInOctets value back for each port.
- 1.3.6.1.2.1.2.2.1.1.38 – ifIndex
- 1.3.6.1.2.1.2.2.1.10.38 – ifInOctets
- 1.3.6.1.2.1.2.2.1.3.38 – ifType
Notice how the value is the same for every port, although if I re-query the switch it will provide a different value for every port. In short the incorrect value breaks the application since it appears that every port is changing daily and no ports are ever becoming idle.
root@roo ~]# snmpgetnext -v2c -cpublic sw-icr3-psyc.acme.org ifIndex.1 ifInOctets.1 ifType.1 IF-MIB::ifIndex.2 = INTEGER: 2 IF-MIB::ifInOctets.2 = Counter32: 1106547808 IF-MIB::ifType.2 = INTEGER: ethernetCsmacd(6) [root@roo ~]# snmpgetnext -v2c -cpublic sw-icr3-psyc.acme.org ifIndex.2 ifInOctets.2 ifType.2 IF-MIB::ifIndex.3 = INTEGER: 3 IF-MIB::ifInOctets.3 = Counter32: 1106547808 IF-MIB::ifType.3 = INTEGER: ethernetCsmacd(6) [root@roo ~]# snmpgetnext -v2c -cpublic sw-icr3-psyc.acme.org ifIndex.3 ifInOctets.3 ifType.3 IF-MIB::ifIndex.4 = INTEGER: 4 IF-MIB::ifInOctets.4 = Counter32: 1106547808 IF-MIB::ifType.4 = INTEGER: ethernetCsmacd(6)
If I issue a SNMP get next for just the single OID then the switch returns the correct value;
[root@roo ~]# snmpgetnext -v2c -cpublic sw-icr3-psyc.acme.org ifInOctets.1 IF-MIB::ifInOctets.2 = Counter32: 3903266154 [root@roo ~]# snmpgetnext -v2c -cpublic sw-icr3-psyc.acme.org ifInOctets.2 IF-MIB::ifInOctets.3 = Counter32: 2492668434 [root@roo ~]# snmpgetnext -v2c -cpublic sw-icr3-psyc.acme.org ifInOctets.3 IF-MIB::ifInOctets.4 = Counter32: 792830238
The result is the same whether I use SNMP v1 or SNMP v2c.
The script itself really isn’t concerned with precision, we actually only record the last 6 digits of the counter. If we were concerned about precision we might have to start utilizing ifHCInOctets (1.3.6.1.2.1.31.1.1.1.6) since this is a 10/100/1000Mbps switch port and the counters might wrap between polls.
I’ve only seen the problem on the Avaya ERS 4850-GTS-PWR+ switch running HW:10 FW:5.6.2.1 SW:v5.6.3.024. I have not observed this problem on any other models including the Avaya ERS 5000, 4500, 470 or 460 switches.
Avaya confirmed the presence of the bug today and will be escalating the case to design.
I’m curious if Solarwinds or other management platforms have stumbled upon this bug.
Cheers!
Update: Monday, August 26, 2013
I’ve learned that Avaya will address this bug in software release 5.6.4 which is due out anytime now. ;)