Michael McNamara

Website Monitoring with Bash and Nagios Plugins

Michael McNamara — Tue, 23 Dec 2014 13:00:37 +0000

It’s no surprise that I need to know when our websites are down, but I also need to know why they went down. Often the redundancy will kick in and the website will quickly recovery. However, the question remains why was the website down? Was there a circuit failure, a router failure, a load-balancer failure, a web server failure, an application server failure, a database failure? While you can glean a lot of information from the log data generated by the routers, firewalls, switches, load-balancers and web servers sometimes there are gaps in that data. A few months ago I put together a quick bash script that calls a few Nagios plugins to help me gather some data points in the event that I needed to look back in time, after the fact, to determine what had caused an outage or failure. I decided to stand up a few Linode Linux servers spread out across a number of Data Centers around the world. While there are dozens if not hundreds of commercial solutions for website monitoring but I wanted something cheap in which I had complete control over and writing this script took all of 2 hours one afternoon.

The script will run every 60 seconds and will call the origin web server via an HTTP call and validate that it’s returning the proper HTML content. If the server fails to answer the first HTML call or response doesn’t contain the prerequisite content the script will wait and try a second time. If the second HTTP call fails the script will then log that fact and it will try a PING to verify that it can reach the web server. If the PING fails, the script will kick off a traceroute using mtr to try and isolate the location of the problem. A second script performs ICMP pings every 60 seconds to every piece of our public network infrastructure including the firewalls and load-balancers across our multiple Data Centers from multiple public Internet points.

The combination of the data points from both scripts, being run in multiple Data Centers around the world made it relatively easy to quickly determine what had transpired during an event. In one case we were alerted to a peering issue between NAC and Level3. In another event we observed a complete disconnect between NetworkLayer/SoftLayer and Comcast between 1AM and 2AM one night – I’m guessing that was some type of scheduled maintenance, and they didn’t have BGP configured properly. There were a few times though when the script would alert that everything was down but only from a single Data Center, this often indicated that there was a problem with the Internet peers that connected that Data Center to the Internet in general. It wasn’t a fool proof solution by any means but it gave me the data points I needed and the freedom to adapt as needed.

You can download the entire script from the link.

#!/bin/bash
#
# Filename: /usr/local/monitor/monitor.sh
#
# Purpose:  Monitor the availability of several websites and report their
#           availabilty. This script leverages several Nagios plugins to
#           help simplify the collection of data.
#
# Language: Bash Script
#
# Author:   Michael McNamara
#
# Verzion:  0.9
#
# Date:     Oct 26, 2014
#
# License:
#           Copyright (C) 2014 Michael McNamara (mfm@michaelfmcnamara.com)
#
#           This program is free software: you can redistribute it and/or modify
#           it under the terms of the GNU General Public License as published by
#           the Free Software Foundation, either version 3 of the License, or
#           (at your option) any later version.
#
#           This program is distributed in the hope that it will be useful,
#           but WITHOUT ANY WARRANTY; without even the implied warranty of
#           MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#           GNU General Public License for more details.
#
#           You should have received a copy of the GNU General Public License
#           along with this program.  If not, see 
#
# Changes:
#           Nov 11, 2014 add lock file checking to prevent multiple instances
#           Oct 31, 2014 added code to retry the HTTP_CHECK before alarm
#           Oct 30, 2014 added additional websites to query
#           Oct 27, 2014 cleaned up script/updated documentation
#
# Requirements:
#
#           Nagios check_icmp plugin
#           Nagios check_http plugin
#           Nagios check_dns plugin
#           http://nagiosplugins.org/
#
# Notes:
#        Command Line Reference;
#          ./monitor.sh
#
#

# Declare Variables
SENDMAIL="/bin/mail"

CHECK_HTTP="/usr/local/monitor/check_http"
CHECK_FPING="/usr/local/monitor/check_fping"
CHECK_DNS="/usr/local/monitor/check_dns"

MTR="/usr/sbin/mtr"
LOG="/usr/local/monitor/monitor.log"
LOCKFILE="/tmp/monitor.tmp"
LOCATION="New York, NY"

MAIL_TO="root"
MAIL_SUBJECT="HTTP: Web Application Status Report ($LOCATION)"

#
### SITE SPECIFIC INFORMATION <<<<< YOU SHOULD EDIT THE LINES BELOW
#
# IPS = List of webservers by FQDN or IP address
IPS=( webserver1.acme.com webserver2.acme.com webserver3.acme.com)

# HOSTS = The FQDN of the web property that resides on the webserver
HOSTS=( www.brandone.com www.brandtwo.com www.brandthree.com )

# URLS = The path to be appended to the FQDN of the web property
URLS=( /brand1/index.jsp /brand2/index.jsp /brand3/index.jsp )

# CONTENTS = A regex containing some text that should be found on
#            the webpage for each brand or web property.
CONTENTS=( "Brand One" "Brand Two" "Brand Three" )
#
# <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 

#################################################################### 
# M A I N    P R O G R A M 
#################################################################### 
# LETS WAIT FOR A LITTLE SO WE'RE NOT FIRING AT THE TOP OF THE MINUTE 
sleep 15 

# LETS CHECK TO SEE IF THERES ALREADY A COPY RUNNING 
if [ -e ${LOCKFILE} ] && kill -0 `cat ${LOCKFILE}`; then     
    echo "already running"     
    exit 
fi 

# SETUP A TRAP INCASE WE EXIT PREMATURELY 
trap "rm -f ${LOCKFILE}; exit" INT TERM EXIT 

# LETS CREATE A LOCKFILE 
echo $ > ${LOCKFILE}

# LETS ITERATE OVER THE WEBSERVERS ($IPS[])
for (( i = 0; i < ${#IPS[@]}; i++ )) do     
    # LETS TRY A QUICK HTTP CALL AND SEE WHAT WE GET
    RESULT1="`${CHECK_HTTP} -I ${IPS[$i]} -H ${HOSTS[$i]} -u ${URLS[$i]} -s ${CONTENTS[$i]}`"     

    # LETS CHECK THE RESULT
    if [[ $RESULT1 =~ "OK" ]]     then
         # IF THE RESULT WAS OK THEN LOG THE RESULT
         echo "$(date) OK ${IPS[$i]} ${HOSTS[$i]} $RESULT1" >> $LOG
    else
        # IF THE RESULT WAS BAD LETS DO MORE, FIRST WAIT A LITTLE
        sleep 10

        # LOG THAT WE FAILED THE FIRST CHECK
        echo "$(date) FAIL ${IPS[$i]} ${HOSTS[$i]} $RESULT1" >> $LOG

        # ATTEMPT A SECOND HTTP CALL AND SEE WHAT WE GET
        RESULT2="`${CHECK_HTTP} -I ${IPS[$i]} -H ${HOSTS[$i]} -u ${URLS[$i]} -s ${CONTENTS[$i]}`"

        # LETS CHECK THE RESULT
        if [[ $RESULT2 =~ "OK" ]]
        then
            # IF THE RESULT WAS OK THEN LOG THE RESULT
            echo "$(date) RETRY OK ${IPS[$i]} ${HOSTS[$i]} $RESULT2" >> $LOG
        else
            # IF THE RESULT WAS BAD LETS DO MORE, FIRST LOG AND EMAIL
            echo "FAIL RETRY ${IPS[$i]} ${HOSTS[$i]} $RESULT2" | $SENDMAIL -s "$MAIL_SUBJECT" $MAIL_TO
            echo $(date) RETRY FAIL ${IPS[$i]} ${HOSTS[$i]} $RESULT2 >> $LOG

            # LETS TEST AN ICMP CALL TO THE WEBSERVER TO VALIDATE ITS NOT A NETWORK ISSUE
            PING1="`${CHECK_FPING} ${HOSTS[$i]} -n 3`"

            # LETS CHECK THE RESULT
            if [[ $PING1 =~ "OK" ]]
            then
                # IF THE RESULT WAS OK THEN LOG THE RESULT
                echo $(date) OK ${IPS[$i]} ${HOSTS[$i]} $PING1 >> $LOG
            else
                # IF THE RESULT WAS BAD THEN LOG AND EMAIL AND COLLECT A TRACEROUTE
                echo "FAIL RETRY PING ${IPS[$i]} ${HOSTS[$i}} $PING1" | $SENDMAIL -s "$MAIL_SUBJECT" $MAIL_TO
                echo $(date) FAIL ${IPS[$i]} ${HOSTS[$i]} $PING1 >> $LOG
                `${MTR} --no-dns -rc 5 ${IPS[$i]} >> $LOG`
            fi
        fi
    fi
done

# LETS WAIT FOR A FEW SECONDS
sleep 10

# LETS CLEAN UP AND REMOVE THE LOCKFILE
rm -f ${LOCKFILE}

# THATS ALL FOLKS!
exit 0

Cheers!

Note: This is a series of posts made under the Network Engineer in Retail 30 Days of Peak, this is post number 29 of 30. All the posts can be viewed from the 30in30 tag.

Image Credit: michele de notaristefani

Swatch – Simple Log Watcher

Michael McNamara — Sun, 21 Dec 2014 00:00:37 +0000

It’s a wonder the odd and bizarre problems that seem to find me. Straight from the front lines I had an issue with a Motorola WS5100 v3.3.5.0-002R falling down at the most inopportune time of the retail calendar. While the original problem appeared on December 17 it returned last night to spoil the weekend.

I'm troubleshooting a @Motorola WS5100 v3.3.5.0-002R that is being plagued with frequent high CPU utilization and generating 25,000+ pps

— Michael McNamara (@mfMcNamara) December 18, 2014

In the process of trying to understand the problem and come up with a solution I wanted to have better visibility and alerting when the problem actually occurred, I didn’t want to incur the delay that would involve the users calling the help desk and the help desk calling me. Thankfully there is a SYSLOG message recorded when an Access Port experiences a watchdog reset so I had a log message now I needed to find a way to alert on that message.

That’s where I turned to swatch, a handy little utility that will monitor log files for regular expressions and then take whatever action, such as ringing the console or sending an email message is configured. I installed swatch with relative ease thanks to yum and then set out to configure it appropriately.

I created the following configuration file;

#/etc/swatchrc

# swatchrc - define regular expressions and generate alerts when matches are found in logs
#            daemon is started from /etc/cron.d/swatch

# Motorola AP300 - malfunctioning AP ignore events from this device

ignore /00-A0-F8-ZZ-ZZ-ZZ/

# Motorola WS5100 Access Port Adoption Errors Reboot/Watchdog events

#Dec 20 07:53:07 ACME-WLS1 %CC-6-APREADOPTREASON: AP 00-A0-F8-XX-XX-XX readoption reason: ColdBoot/Watchdog
#Dec 20 07:53:25 ACME-WLS1 %CC-6-APREADOPTREASON: AP 00-A0-F8-XX-XX-XX readoption reason: Link failed

# Let's look for the phrase readoption and we'll alert of that text

watchfor /readoption/
        exec "echo '$_' | mail swatch -s 'SWATCH: Motorola WS5100 Adoption Issue' "
        threshold track_by=$6,type=limit,count=1,seconds=60
        echo=red
        bell 5

#end

In the swatch configuration I used the mail aliase of swatch so I edited the /etc/newaliases file to make sure that the entire team would receive the alert;

#
#  Aliases in this file will NOT be expanded in the header from
#  Mail, but WILL be visible over networks or from /bin/mail.
#
#       >>>>>>>>>>      The program "newaliases" must be run after
#       >> NOTE >>      this file is updated for any changes to
#       >>>>>>>>>>      show through to sendmail.
#

# Basic system aliases -- these MUST be present.
mailer-daemon:  postmaster
postmaster:     root

# General redirections for pseudo accounts.
bin:            root
daemon:         root
adm:            root
...
...
...
swatch:         root,mike,john,dan,tom

If the problem is extremely important I’ll usually add the the email SMS text message gateway for my provider. This way I’ll get both an email message and an SMS text message alerting me to the problem.

# Verizon SMS Text Messaging 123456789@vtext.com
# AT&T SMS Text Messaging 123456789@txt.att.net
# T-Mobile SMS Text Messsaging 123456789@tmomail.net
# Sprint SMS Text Messaging 123456789@messaging.sprintpcs.com

I made sure to recompile the aliases file with the newaliases command and then I set off to run swatch in the foreground of my SSH session.

[root@centos /]# swatch -c /etc/swatchrc -p 'tail -f -n 0 /var/log/loc1fac17.log'

*** swatch version 3.2.3 (pid:30643) started at Sat Dec 20 15:54:14 EST 2014

And I waited for the event.

Now I could go about doing some research and due diligence without worrying that I might inadvertently fail to spot the problem.

I’ll let you know how it turned out!

Cheers!

Note: This is a series of posts made under the Network Engineer in Retail 30 Days of Peak, this is post number 26 of 30. All the posts can be viewed from the 30in30 tag.

Statseeker

Michael McNamara — Sat, 17 Nov 2012 13:38:08 +0000

The Networking Field Day 4 delegates and myself met with Stewart Reed of Statseeker on the sixth floor sixth floor of Casino M8trix in San Jose, CA on Wednesday afternoon October 10, 2012.

While I have no personal experience with Statseeker although I have a lot of experience running large scale implementations of MRTG and RRD on both physical hardware and virtual guests.

I’m going to outline the different presentations that we heard and perhaps make a few points here and there if I have anything useful to say. I’ll include a short blurb from Statseeker in italics to help define/describe each product or solution. Thankfully since the sessions were recorded you can watch for yourself and make your own informed opinion.

Here’s my disclaimer; I’m not endorsing any of the solutions presented below. I’m merely passing on the information along with a few comments of my own. If you have any personal opinions about the solutions below why not share them with us in the comments?

Statseeker

Tech Field Day Video Part 1 | Part 2
by Stewart Reed

Statseeker is highly scalable, industrial strength, network monitoring software that delivers 100% visibility of every interface, on every device, on any sized network, every 60 seconds. Statseeker scales up to 500,000 interfaces per server, installs as an appliance, is configured in minutes, reports in any time zone and never rolls up the data. Monitoring of Network Interfaces and Devices, NetFlow, sFlow, LAN Traffic, Traps, NBAR, Syslog, IP SLA, Disk, Memory, Temperature, CPU, Servers, Printers, UPS and more… Statseeker was the tool of choice to collect data on the performance of more than 154,000 interfaces, across more than 3,000 sites, from a single server.

Networking Field Day 4 Demo

My Thoughts?

Statseeker is a performance monitoring solution in competition with MRTG/RRD, What’s Up Gold, Solarwinds, Nagios, Cacti, etc. Statseeker differentiates itself in four ways, 1) it polls every interface in the network, 2) it polls every interface every 60 seconds, 3) it doesn’t roll-up or average any of the previously captured data points, 4) scalability – it can poll an incredible number of interfaces on a single low end server or desktop hardware.

I’ve been working with MRTG and RRD for years now and it’s always a challenge managing and tuning any sizable installation. And while I’m a former system administrator not everyone on the team is as skilled or knowledgeable when it comes to Linux administration or MRTG/RRD in general. While it’s nice to have open source solutions sometimes a commercial offering that just gets the job done without any daily care or feeding is just what the doctor ordered.

In the case of Statseeker it does performance monitoring really well and at scale which can be challenging.

How much disk space does Statseeker need? It requires 1GB yearly for every 1,000 interfaces you plan on monitoring.

I’m impressed with the scalability of Statseeker. While Statseeker primarily focuses on performance monitoring they also handle device availability (and alerting), SNMP traps, syslog, NetFlow and sFlow (no IPFIX I guess).

If you’d like to hear more Statseeker recently recorded a sponsored podcast with the Greg and Ethan over at the Packet Pushers Podcast.

And there’s this cute marketing video that Statseeker released on YouTube;

http://www.youtube.com/watch?v=6o_ts5ztPGE&feature=player_embedded&noredirect=1

Cheers!

Solarwinds – New Sponsor

Michael McNamara — Thu, 01 Nov 2012 01:02:39 +0000

I’m happy to announce that Solarwinds has become a supporting sponsor of this blog and the discussion forums.

Since our founding in 1999, SolarWinds’ (NYSE: SWI) mission has been to provide purpose-built products that are designed to make IT professionals’ jobs easier. We offer value-driven products and tools that solve a broad range of IT management challenges – whether those challenges are related to networks, servers, applications, storage or virtualization

Solarwinds has a plethora of management tools that cover application & server, network, log & security information, storage and virtualization. They have received numerous awards and are well respected among system administrators and network engineers alike. Solarwinds also has a vast number of free tools available at no cost to administrators and engineers. I won’t list every tool but I’ll list some of the tools that I’ve used in the past;

Network Management

Real-Time Bandwidth Monitor
IP Address Tracker
Subnet Calculator
Wake-on-LAN

Log & Security Information Management

Event Log Consolidator

I do own a copy of the Solarwinds Engineer’s Toolset which I can attest is an awesome collection of tools for everyday use.

IP Address Management

IP Address Management solutions are critical to managing today’s growing networks and their ever busy network administrators and engineers. IPAM solutions look to break away from the traditional spreadsheet and wasted repetitious data entry by providing a single point of management for IP addressing, DHCP/BOOTP and DNS. What makes Solarwinds solution attractive is the competitive pricing and deep feature set.

Here are some of the highlights;

Centralized DHCP and DNS Management and Monitoring
Automated IP Address Scanning
IP Address Preventative Alerting
User Delegation
Detailed Event Recording
IP Address Reporting

IP Address Manager will manage and monitor your Microsoft DHCP and DNS servers. IP Address Manager will even allow you to monitor any Cisco DHCP servers.

I do a fair amount of consulting these days usually for smaller organizations that don’t have the staffing and resources to handle the large projects or highly technical problems. One of the biggest shortcomings I see in the majority of customers is the lack of an IPAM solution. And what does that lead? The dreaded IP address conflict should never really occur in a properly managed network.

I love the phrase “Say Goodbye to the Spreadsheet!” I would highly encourage anyone who has more than a 250 node network to seriously consider investing in a IPAM solution. It’s not going to break the bank and it will save you a lot of time and effort in the long run.

Thank you to Solarwinds for sponsoring this site and the discussion forums. Solarwinds sponsorship allows me to continue to operate and expand both this site and the discussion forums and greatly helps offsetting the financial burden that I would otherwise incur.

Disclosure: Solarwinds has purchased advertising space on this blog and the discussion forums. I’m a Solarwinds customer and have no reservations recommending their products as they work and work well. I would suggest you evaluate their products as you should any product before making any purchasing decisions.

Cheers!

HP OV NNM 9i Incident Configuration Action – Blat

Michael McNamara — Fri, 12 Aug 2011 03:05:09 +0000

I thought I would share command line (action I )came up with for having HP Open View NNM send email notifications based on the various SNMP traps received by the management station.

blat.exe -server smtp.acme.org -to Alert@acme.org -cc HelpDesk@acme.org -html -subject "HPOV: UPS on battery ($snn)" -body "HP Open View NNM Alarm Incident Report

||Date: %date% %time% ($fot)
|Alarm: UPS on battery - power failure($name)
|Node: $snn($sln)
|IP: $mga ($oma)
|Contact: $sct
|Location: $slc

|Notes: generated by HP Open View NNM 9i management server.
|"

The command line above will utilize blat to send an HTML formatted email message to alert@acme.org with a copy to helpdesk@acme.org with the body looking something similar to the figure to the figure to the right. I’ve sanitized the screenshot to protect the organization I’m currently employed with. The example above is for the SNMP trap ‘upsOnBattery’ while the image to the right is an example of the SNMP trap ‘powerRestored’. You’ll notice the | in the command line is interpreted as a CR/LF by NNM. Here are some of the parameters used in the above example;

$ssn – node name of the object sending the SNMP trap
$fot – first occurence time
$name – OID name of the trap received
$sln – DNS name of the node
$mga – management IP address
$oma – alternative management IP address
$sct – contact information for the object as stored in sysContact.0
$slc – location information for the object as stored in sysLocation.0

While it wasn’t too hard it did take some time to get all the formatting down and get it working reliably.

Cheers!

Remote Packet Capture with WireShark and WinPCAP

Michael McNamara — Sun, 05 Sep 2010 14:22:26 +0000

I’m just continually impressed with the quality of so many open source products available today. One such product that should be extremely high on any network engineer’s list is WireShark. WireShark has become the de-facto standard for packet capture software and is almost unrivaled in features and functionality.

Last week I had the task of diagnosing some very intermittent desktop/application performance issues at a remote site. I had installed WireShark locally on a few desktops but I wanted the ability to remotely monitor a few specific desktops without obstructing the users workflow to get a baseline for later comparison. I was excited to learn that WireShark and WinPCAP had (experimental) remote packet capture functionality built into each product. I followed the instructions on the WireShark website by installing WinPCAP v4.1.2 on the remote machine and then starting the “Remote Packet Capture Protocol v.0 (experimental)” service. With that done I then proceeded to launch WireShark on my local desktop and configure the remote packet capture settings. From within WireShark I chose Options -> Capture, changed the Interface from Local to Remote. Then enter the IP address of the remote machine along with the TCP port (the default TCP port is 2002). I initially tried to use “Null authentication” but was unsuccessful. I eventually ended up choosing “Password authentication” and used the local Administrator account and password of the remote desktop that had WinPCAP installed on it. If the remote desktop had multiple interfaces I could have selected which interface I wanted to perform the remote packet capture on. In this case the desktop in question only had an integrated Intel(R) 82567LM-3 network adapter. I clicked ‘Start’ and to my sheer amazement the packet trace was off and running collecting packets from the remote desktop. There will still be the occasional need to place the Dolch (portable sniffer) onsite when the situation demands it but this is a great tool to have available.

Cheers!

Updated: Sunday September 5, 2010
The images appear to be missing above because the URL paths are wrong, not sure how WordPress messed up that. I don’t have time right now to fix it but I will fix it a little later.

Nortel ERS 8600 10GB and MRTG

Michael McNamara — Thu, 17 Jul 2008 02:00:33 +0000

I recently stumbled across a problem in MRTG when I was attempting to graph the utilization of a 10Gb interface on a 8683XLR card within a ERS 8600 switch. I noticed that the MaxBytes value that was listed in the MRTG configuration file was 176258176 (176.3 MBytes/s). While this was obviously incorrect I wondered how that value came to be.

I quickly fired up the trusty snmp-net toolset (snmpget) and confirmed from a CentOS Linux host that the ifSpeed OID was returning 1410065408, which is of course is not the correct value for a 10Gb interface. The problem here is that a 32 bit integer can only hold a maximum value of 2^32 – 1 (4,294,967,295). While I believe other manufacturers will just return the maximum value (4,294,967,295) the Nortel switch appears to return a very odd value. I’m not exactly sure why it’s returning that value, perhaps someone from Nortel can help answer that question.

How to fix it?

The Perl script that builds the configuration files for MRTG, cfgmaker, requires a little editing so that it knows how to react if it receives the odd value of 1410065408 when it polls the ifSpeed OID. Here’s the necessary edit;

223c223
<             if ( (not defined $value) || ($value == 2**32-1)) {
---
>             if ( (not defined $value) || ($value == 2**32-1)  || ($value = 1410065408)  ) {

This edit will cause the configuration script to ignore the ifSpeed value and use the ifHighSpeed value which should return a proper value for MaxBytes of 1250000000 (1250.0 MBytes/s).

A quick word of warning here, you MUST use SNMP v2 with MRTG in order to see the ifHighSpeed OID.

Here’s an example of the paramaters I use when calling the cfgmaker script;

/usr/local/mrtg/bin/cfgmaker --global "PathAdd: /usr/local/rrdtool/bin" --global "LibAdd: /usr/local/rrdtool/lib/perl" --global "WorkDir: /var/www/html/mrtg/mdc" --global "IconDir: /mrtg/" -global "WithPeak[_]: wmy" --global "LogFormat: rrdtool" --ifdesc=descr --no-down --zero-speed=100000000 --snmp-options=:::::2 snmpreadstring@sw-ers8600.dc.acme.org -output=sw-ers8600.dc.acme.org.cfg

The parameter “–snmp-options=:::::2” above tells cfgmaker and MRTG to use SNMP v2 when polling the switch so it can retrieve the 64 bit counters such as ifHCInOctets and ifHCOutOctets as well as ifHighSpeed.

Cheers!

Update: Thursday July 17, 2008

I went back and decided to test the 10GB XFP ports on the Nortel Ethernet Routing Switch 5530. Interestingly enough that switch returns a value of 4294967295 when you query the ifSpeed OID. The output produced by cfgmaker correctly identifies that port as MaxBytes 1250000000 (1250.0 MBytes/s). I may need to call Nortel myself just to satisfy my curiousity on this one.

Cheers!

Update: Wednesday July 23, 2008

It seems that this problem was corrected in software v4.1.5.4 and later for the Ethernet Routing Switch 8600. I was able to confirm that an ERS 8600 switch running v4.1.5.4 does return 4294967295 as expected. Here’s the blurb from the release notes;

MIB value for ifSpeed is now displayed correctly for any 10 Gigabit interface.(Q01174158-01)

Cheers!

Network Time Protocol (NTP)

Michael McNamara — Sun, 15 Jun 2008 14:00:00 +0000

[ad name=”ad-articlebodysq”]I’m sometimes amazed at how many large organizations don’t have a centralized Network Time Protocol (NTP) server setup and devices configured appropriately. When troubleshooting a problem it’s vital that the timestamps in the logs for each switch, router, server and appliance match up correctly.

I’m currently using two CentOS Linux servers to provide time services to over 10,000 devices in the network. My two servers are themselves syncing up with pool.ntp.org over the Internet. With CentOS I didn’t need to build the software, I only needed to install the NTP package through YUM and then configure it appropriately. It was really easy, much easier than it was say 10 years ago when you had to compile the NTP software (University of Delaware) by hand hoping you didn’t run into some missing library of version mismatch with the compiler.

We would first need to install the NTP software using YUM;
[root@hostname ]# yum install ntp

We would need to start the NTP daemons;
[root@hostname ]# service ntpd start

We would need to configure the server so the NTP software would start after every reboot;
[root@hostname ]# chkconfig ntpd on

With that step done we’d have ourselves and internal NTP server which would sync itself to the Internet (default configuration file in /etc/ntp.conf) and then our internal devices would sync to it.

Here are the CLI commands for configuring the ERS 8600 switch properly;

config bootconfig tz dst-name "EDT"
config bootconfig tz name "EST"
config bootconfig tz offset-from-utc 300
config bootconfig tz dst-end M11.1.0/0200
config bootconfig tz dst-start M3.2.0/0200

config ntp server create a.b.c.d
config ntp server create a.b.c.d
config ntp server create a.b.c.d
config ntp enable true

I’ve add the two configuration statements for the new Daylight Saving Time changes that were enacted in 2007. Please also note that I’m in the Eastern timezone (EDT/EST) so if you’re not in the Eastern timezone you would need to supplement your timezone abbreviation appropriately.

Here are the commands for an ES460,ES470,ERS4500 or ERS5500 series switch

5520-48T-PWR# config terminal
5520-48T-PWR (config)# sntp server primary a.b.c.d
5520-48T-PWR (config)# sntp server secondary a.b.c.d
5520-48T-PWR (config)# sntp enable
5520-48T-PWR (config)# exit5520-48T-PWR#

The ERS 4500/5500 Series now supports Daylight Saving Time. This feature is NOT supported on the ES460 and ES470 switches. –-CORRECTION: this feature is support on the ES460/470 as of v3.7.x software, please see update at the bottom of this post for additional information. If you wanted to configure the timezone on the ERS4500/ERS5500 switch you would use the following commands;

5520-48T-PWR>enable
5520-48T-PWR# config terminal
5520-48T-PWR (config)# clock time-zone EST -5
5520-48T-PWR (config)# clock summer-time EDT date 9 Mar 2008 2:00 2 Nov 2008 2:00 +60
5520-48T-PWR (config)# exit
5520-48T-PWR#

You can use “show sntp” and “show clock” the ERS 5500 Series switch to check out your changes;

5530-24TFD#show sntp
SNTP Status:                      Enabled
Primary server address:         10.1.20.1
Secondary server address:     10.1.20.1
Sync interval:                      24 hours
Last sync source:                 10.1.20.1
Primary server sync failures:    0
Secondary server sync failures: 0
Last sync time:                  2008-06-14 14:47:31 GMT-04:00
Next sync time:                  2008-06-15 14:47:31 GMT-04:00
Current time:                     2008-06-15 13:52:24 GMT-04:00

5530-24TFD#show clock
Current SNTP time  :    2008-06-15 13:52:29 GMT-04:00
Summer time is set to:
start: 28 March 2007 at 02:00
end: 30 August 2008 at 15:00
Offset: 60 minutes. Timezone will be 'EDT'Time Zone is set to 'EST', offset from UTC is -05:00

Hopefully this will provide a brief look into NTP,SNTP and you’ll agree that it really isn’t that hard to setup and configure properly.

Cheers!

Update: June 17, 2008

After posting the article above I decided I would confirm that the Daylight Saving Time feature was not available on the Nortel Ethernet Switch 460/470. I found that as of v3.7.x software the feature is supported on the switches. The configuration commands are identical to the ERS4500/ERS5500 switches. Here’s an example specifically for the Eastern timezone.

470-48T>enable470-48T#config term
Enter configuration commands, one per line.  End with CNTL/Z.
470-48T(config)#clock time-zone EST -5 00
470-48T(config)#clock summer-time EDT date 9 Mar 2008 02:00 2 Nov 2008 2:00 +60
470-48T(config)#show clock summer-time
Summer time is set to:start: 9 March 2008 at 02:00end: 2 November 2008 at 02:00
Offset: 60 minutes. Timezone will be 'EDT'
470-48T(config)#exit

Cheers!

Multi Router Traffic Grapher & RRD

Michael McNamara — Sun, 15 Jun 2008 02:00:00 +0000

I recently needed to share some network utilization data with some non IT folks in our organization. I produced a quick report from a dynamic HTML page that contained multiple MRTG graphs. Needless to say the graphs did most of the talking while I just answered the questions. One person commented that they didn’t know we had purchased such an elaborate management and monitoring solution. In short we hadn’t purchased any high-end management or monitoring solution, but we had setup Multi Router Traffic Grapher (MRTG) and Round Robin Database (RRD) both of which were written by Tobi Oetiker with contributions from many others.

I’ve been personally using MRTG for well over 10 years now and I’ve yet to find any product (commercial or open source) that comes close. These two tools work hand in hand to help me graph and chart almost any SNMP value (you can also graph non-SNMP values but you need a script or something to collect the values) on almost any device connected to the network. The obvious examples for network engineers and architects such as myself is to use MRTG/RRD to help monitor current network utilization and forecast future growth. There are other examples such as graphing the temperature of a computer room or even the amount of rainfall. There are literally hundreds of examples but I’ll leave you to enjoy reading about them all from the MRTG web site.

Here are two quick examples;

Internet Link with XO (Ethernet ~ 50Mbps)

Internet Link with Level3 (Ethernet ~ 50Mbps)

In the above figures MRTG is graphing the average of ifInOctets and ifOutOctects over a 5 minute interval. As I said above you could graph almost any value you wished.

I also use MRTS by Thor Dreier to help get an idea of how much actually data is traversing a specific network or interface. When we recently installed an HP MAS (Medical Archive Solution) which was built around grid computing and virtual storage technologies we observed a 300% increase in WAN traffic as the MAS was replicating data for business continuity purposes.

I will admit that MRTG can be somewhat complicated for the fledgling network engineer. However, there are dozens of implementation guides now available on the MRTG web site, including support for running MRTG on Windows.

Cheers!

Perl Script to poll ARP Table

Michael McNamara — Mon, 05 May 2008 14:00:00 +0000

I’ve written a lot of Perl scripts to help make managing the network easier and more efficient. One of the scripts I’ve written allows me to dump the IP ARP table of the Nortel Ethernet Routing Switch 8600 to a file for later/additional processing. While the script was original written for the ERS 8600 switch it will also work on just about any router (Layer 3 device) that supports the RFC1213 (ipNetToMediaNetAddress).

The script has been tested and works on Nortel’s BayRS routers (ARN, ASN, BLN, BCN). You just obviously need to be careful of how the script interprets the ipNetToMediaIfIndex value depending on the device you are polling.

The script get8600arp.pl is a very straight forward script. It simply polls various SNMP OIDs and then stores the results in a file. It does this for every switch (FQDN/IP Address) that is listed in the input file.

#!/usr/bin/perl
#
# Filename: /root/get8600arp.pl
#
# Purpose:  Query Nortel Ethernet Routing Switch 8600 for the IP ARP
#           table via SNMP. This script will poll a list of devices
#           (input file) and dump the contents of the IP ARP table to
#           and outputfile.
#
# Author:   Michael McNamara
#
# Date:     December 5, 2002
#
# Support Switches:
#           - Nortel ERS 8600
#           - Nortel ERS 1600
#           - Nortel ERS 5500
#           - Nortel BayRS Routers
#
# Requirements:
#           - Net-SNMP
#           - Net-SNMP Perl Module
#           - SNMP-MIBS
#
# Changes:
#
#           - May  5, 2007 (M.McNamara)
#           clean up code and documentation for release to public
#           - Oct 10, 2006 (M.McNamara)
#           went back to SNMP v1 to support BayRS legacy routers
#           - Sep 04, 2003 (M.McNamara)
#           migrated from vendor specific MIB to RFC1213 (ipNetToMediaNetAddress)
#

# Load Modules
use strict;
use SNMP;
use Net::Ping;

# Declare constants
#use constant DEBUG      => 0;           # DEBUG settings
use constant RETRIES    => 3;           # SNMP retries
use constant TIMEOUT    => 1000000;     # SNMP timeout, in microseconds
use constant SNMPVER    => 1;           # SNMP version

# SNMP Settings
$SNMP::verbose = 0;
$SNMP::use_enums = 1;
$SNMP::use_sprint_value = 0;
&SNMP::initMib();
&SNMP::loadModules('RAPID-CITY');

# Declaration Variables
my ($sess, @vals);
my @devices;
my ($card, $port);
my $snmphost;
my $comm = "public";        # SNMP ReadOnly Community String
my %array;
my $switchfile;
my $datafile;

our $DEBUG;                     # DEBUG flag

undef @devices;

# Program and help information
my $program = "get8600arp.pl";
my $version = "v1.3";
my $author = "Michael McNamara";
my $purpose = "This Perl script is retreieve the IP ARP table from the ERS8600 Layer 3 switch/router and store it in file for later use.";
my $usage = "Usage: $program \[input\] \[output\] \[-help\] \[debug\]\n      = filename listing each switch to poll\n     = filename where to store output\n";

if (($#ARGV +1) <= 2) {
 print "Program: $program \nVersion: $version \nWritten by: $author \n$purpose\n\n$usage\n";
 print "DEBUG: ARGV =  $#ARGV\n";
 print "DEBUG: ARGV =  $ARGV[0] $ARGV[1] $ARGV[2] $ARGV[3]\n";
 exit;
}

my $arg1 = shift @ARGV;
my $arg2 = shift @ARGV;
my $arg3 = shift @ARGV;

if ($arg1 =~ /help/) {
 print "Program: $program \nVersion: $version \nWritten by: $author \n$purpose\n\n$usage\n";
 print "DEBUG: ARGV =  @ARGV\n";
 print "DEBUG: ARGV =  $ARGV[0] $ARGV[1] $ARGV[2] $ARGV[3]\n";
 exit;
}

$switchfile = $arg1;
$datafile = $arg2;
$DEBUG = $arg3;

# Test to see if inputifle exists
if (!-e $switchfile) {
 die "ERROR: Unable to locate and/or open inputfile $switchfile...";
}

############################################################################
##### B E G I N   M A I N ##################################################
############################################################################

&load_switches;

&collect_arp;

exit 0;

############################################################################
#### E N D   M A I N #######################################################
############################################################################

############################################################################
# Subroutine collect_arp
#
# Purpose: collect ARP information from layer 3 switches/routers
############################################################################
sub collect_arp {

 # Open output datafile for appending
 open(DATAFILE, ">>$datafile");

 # Loop over each Passport 8600 switch
 foreach $snmphost (@devices) {

    my $packet = Net::Ping->new('icmp');

    $snmphost =~ s/\n//g;        # remove CRLF

    if ($packet->ping($snmphost)) {

       $sess = new SNMP::Session (    DestHost   =>  $snmphost,
                              Community  =>  $comm,
                              Retry      =>  RETRIES,
                              Timeout    =>  TIMEOUT,
                              Version    =>  SNMPVER );

       my $vars = new SNMP::VarList(
                              ['ipNetToMediaIfIndex', 0],
                              ['ipNetToMediaPhysAddress', 0],
                              ['ipNetToMediaNetAddress', 0],
                              ['ipNetToMediaType', 0] );

       while (1) {

          @vals = $sess->getnext($vars);  # retreive SNMP information

          last unless ($vars->[0]->tag eq 'ipNetToMediaIfIndex');

          $vals[1] = unpack('H12', $vals[1]);
          $vals[1] =~ tr/a-z/A-Z/;

          $card = (($vals[0] & 62914560) / 4194304);
          $port = (($vals[0] & 4128768) / 65536) + 1;

          print "$snmphost, $vals[0], ($card/$port), $vals[1], $vals[2], $vals[3]\n" if ($DEBUG);
          print DATAFILE "$snmphost, $vals[0], $card, $port, $vals[1], $vals[2]\n";

          $array{$snmphost}[$card][$port] = $vals[2];

       } # end while

    } else {

       print ("ERROR: $snmphost not responding to ICMP ping skipping...\n");

    } #end if $packet

 } #end foreach

 close(DATAFILE);

} #end sub collect_arp

############################################################################
# Subroutine load_switches
#
# Purpose: load list of switches
############################################################################
sub load_switches {

 open(SWITCHLIST, "<$switchfile");

 # Walk through data file
 while () {

    # Skip blank lines
    next if (/^\n$/);
    # Skip comments
    next if (/^#/);

    #print "DEBUG: adding $_ to our list of devices \n" if ($DEBUG);

    push (@devices, $_);

 }

 close(SWITCHLIST);

 return 1;

} # end sub load_switches
############################################################################

The real magic that folks have always been searching for is the binary formula to turn the ipNetToMediaIfIndex into a location that denotes the card and port where that specific device is connected to.

$card = (($vals[0] & 62914560) / 4194304);
$port = (($vals[0] & 4128768) / 65536) + 1;

While I still use flat files you could certainly adopt this code to dump the output into a database. I just haven’t had the time although I’ve been playing with MySQL quite a bit lately.

Cheers!

How to rlogin to a Nortel Call Server

Michael McNamara — Thu, 01 May 2008 00:00:00 +0000

In this post I’ll explain how to remotely connect to a Nortel Succession Call Server 1000 (Meridian 1 Option11C, Option61C, Option81C) over the network. The Nortel Succession Call Server will obviously need to be connected to the network and you’ll need to know the IP address or FQDN (Fully Qualified Domain Name). In the past Nortel has traditionally frowned against connecting the ELAN to a any large network (internal or external). You may want to configure some IP filters to protect the ELAN from any unnecessary traffic.

We’ll be using PuTTY to rlogin into the Nortel Succession Call Server from our Windows XP desktop.

Note: you could use any operating system that supports rlogin using the parameters provided below.

Once you have PuTTY installed you’ll need to create a session, this is essentially a profile which can be stored for quick access later.

After installation you can run PuTTY from Start -> Programs -> PuTTY -> PuTTY

The example above is for the Nortel Succession Call Server v4.5 (formerly Meridian 1 Option81C PBX). The DNS name of the core CPU in the example above is “pbx-intf1”.

Step 1. Enter the “Host Name” into the dialog box. (example; pbx-intf1.acme.org)

Step 2. Select the Protocol “Rlogin”

Step 3. Save the Session by giving it a name and clicking “Save”
(I usually use the FQDN for the session name making it easy to recall later)

Step 4. Click on the “Connection” text in the left menu tree.

After clicking the “Connection” tree you’ll see a window similar to above.

Step 5. Set the “Auto-login username” to “CPSID”

Step 6. Click on the “Rlogin” text in the tree menu to the left

Step 7. Set the “Local username” to “CPSID”

Step 8. Click on the “Session” text in the tree menu to the left.

Step 9. Click on Save to save the session configuration.

All you need to-do now is to click “Open” and PuTTY will establish an RLOGIN session with the Nortel Succession Call Server.

You will still need to log into the console using the “LOGI” command.

Cheers!

Wireless Packet Traces (AirPcap)

Michael McNamara — Fri, 04 Apr 2008 02:00:00 +0000

I thought I would take some time to shamelessly plug a product that I recently purchased for my organization.

We are currently working through an issue that is affecting our Nortel 2211 Wireless telephones on our Motorola RFS7000 Wireless LAN Switch. In short it appears that the phone is resetting itself for unknown reasons. The problem is very intermittent and sporadic, hence it’s very difficult to recreate. The vendors involved in the problem, Motorola, Nortel and Polycom (Spectralink) are all asking for wireless traces of the problem. In order to capture the problem we need four laptops; three laptops tracing on each of the wireless channels in the 802.11b 2.4Ghz spectrum and one laptop tracing on the LAN side of the RFS7000. Needless to say that is a lot of hardware to setup. And the wireless laptops really need to physically move with the wireless telephone as it moves through the building (wireless network).

Then I heard that CACE Technologies had a hardware solution that worked with WireShark and allowed for simultaneous packet capture on all three 802.11b channels. Using three AirPcapEx USB adapters I could use a single laptop to capture all three 802.11b channels saving me a lot of hardware and a lot of time trying to assemble/merge the different packet traces.

I’ve been using the solution for the past week and it seem to work well. It was perfect timing because WireShark v1.0 was released earlier this week. Even though it’s a single laptop it can still be a bit of a logistical pain with the three USB adapters and the three antennas. I got some really interesting stares walking around the building with this octopus looking thing on top of the laptop keyboard.

Cheers!

Windows Sysinternals – TCPView

Michael McNamara — Sat, 23 Feb 2008 15:00:00 +0000

In this day and age network problems usually require me to look at everything in the picture including the source and destination device which is usually a Windows PC or server.

One set of tools that I’ve found invaluable is Microsoft’s Windows Sysinternals. They include a large number of utilities for all areas of system administration. I’d like to focus on just one of those utilities, TCPView for Windows v2.53.

TCPView is a Windows program that will show you detailed listings of all TCP and UDP endpoints on your system, including the local and remote addresses and state of TCP connections. On Windows Server 2008, Vista, NT, 2000 and XP TCPView also reports the name of the process that owns the endpoint. TCPView provides a more informative and conveniently presented subset of the Netstat program that ships with Windows. The TCPView download includes Tcpvcon, a command-line version with the same functionality.

While netstat will work in a pinch, TCPView is really nice in that it will show you connections just opened (highlighted in green) and connections that are just closed (highlighted in red). It also shows you the process that is making or attempting to make the connection.

If your using a non-GUI connection or console you can use tcpvcon.exe to dump the same output to a console. This can be very useful if you are remotely administrating a server over a telnet/SSH connection.

Cheers!

What are the ARP and FDB tables?

Michael McNamara — Sun, 17 Feb 2008 15:00:00 +0000

I’ll try to describe and explain the purpose behind the ARP and FDB tables in networking. I will be the first to admit that there are probably much better descriptions that can be found elsewhere on the net.

The ARP (Address Resolution Protocol) table is used by a Layer 3 device (router, switch, server, desktop) to store the IP address to MAC address entries for a specific network device. The ARP table allows a device to resolve a Layer 3 address (IP address) into a Layer 2 address (MAC address). The ARP table is populated as devices issue ARP broadcasts looking for a network device’s Layer 2 (MAC address).

How does it work? When a Layer 3 device has an IP packet that it needs to deliver to a locally attached interface it will look to the ARP table to figure out what MAC address to put into the packet header. The important point above is “a locally attached interface”. If the IP packet is destined for a remote network it will be routed per the routing table. If there is no ARP table entry for the destination IP address the Layer 3 device will try ARP broadcasting for it. Once it has the MAC address for that specific IP address it will forward the packet with the appropriate MAC address in headers. Example; you can list the ARP table of a Windows XP computer by using the following command at the DOS prompt, “arp -a”.

The FDB (forwarding database) table is used by a Layer 2 device (switch/bridge) to store the MAC addresses that have been learned and which ports that MAC address was learned on. The MAC addresses are learned through transparent bridging on switches and dedicated bridges.

How does it work? When a Ethernet frame arrives at a Layer 2 device, the Layer 2 device will inspect the destination MAC address of the frame and look to its FDB table for information on where to send that specific Ethernet frame. If the FDB table doesn’t have any information on that specific MAC address it will flood the Ethernet frame out to all ports in the broadcast domain.

A Layer 3 switch performs both the routing and switching in a single device. It will typically have both an ARP and FDB table and it will perform both tasks depending on whether the packet/frame needs to be routed or switched. The Nortel Ethernet Routing Switch 8600 is a Layer 3 switch while the Nortel Ethernet Switch 470 is a Layer 2 switch. The Nortel Ethernet Routing Switch 5500 Series is also a Layer 3 device that can be used a Layer 2 device if desired.

Let me point out that Wikipedia is a great resource these days for an amazing number of topics. It’s a world-wide collaborative effort with over 75,000 contributors. Anyone can sign-up and contribute content in whatever subject material they are knowledgeable in. It’s probably best described as the world’s largest growing online encyclopedia.

Have a look at the following Wikipedia entry;

http://en.wikipedia.org/wiki/Ethernet

There is an amazing amount of information in those articles with an equally amazing amount of detail. Thanks to everyone who contributes to Wikipedia!

Cheers!

ERS 8600 (ipNetToMediaIfIndex)

Michael McNamara — Thu, 10 Jan 2008 04:00:00 +0000

There was a recent comment about a Usenet positing I made back in 2002 in comp.protocols.snmp.

In the post I was responding to someone looking for information on how to decode the value returned from the ipNetToMediaIfIndex when querying an ERS 8600 switch. Thankfully Shane (Nortel) was able to help me come up with the forumla.

card = ( $value AND 62914560 ) / 4194304
port = (( $value AND 4128768) / 65536 ) + 1

With that formula you could now walk the ipNetToMediaTable and retreieve the entire ARP table providing you the card and port number, MAC address, and IP address for each entry in the table.

The next issue was how to deal with MultiLink Trunk interfaces. In this case (and with my current software code) I build a table of all the MLT interfaces prior to polling the ipNetToMediaTable. I still use Perl but it shouldn’t be very hard to convert to PHP.

# rcMltNumMlts
$nummlts = $sess->get("rcMltNumMlts.0");

for ($i = 1; $i <= $nummlts; $i++) {
  # rcMltName         
  $mltname[$i] = $sess->get("rcMltName.$i");
  # rcMltId
  $mltindex[$i] = $sess->get("rcMltId.$i");
  # rcMltIfIndex
  $mltifindex[$i] = $sess->get("rcMltIfIndex.$i");
  print "DEBUG: MltId = $i and MltName = $mltname[$i] and MltIndex = $mltindex[$i] and MltIfIndex = $mltifindex[$i]\n" if ($DEBUG);
};

Now that we have the rcMltTable in an array we can walk the ipNetToMediaTable and match up any entries. Here’s the code I use (again it’s Perl but you should be able to convert to PHP);

# Evaulate with bitwise operation
$card = (($vals[0] & 62914560) / 4194304);
$port = (($vals[0] & 4128768) / 65536) + 1;

# Evaulate to determine if port is a MLT
if ($card != 0) {
  $intf = (((64 * $card) + $port) - 1);
  print "DEBUG: $vals[1] address found on card $card port $port\n";
} else {
  $mlt = 1;
  print "DEBUG: $vals[1] address found on MLT $mltname[$port]\n";
} # end else

Hopefully that doesn’t look too complicated. The important piece here is that you need to merge the rcMltTable with the ipNetToMediaTable to get your results. If you name the MLT with something meaningful you can then return that string to the application that is making the query.

I wrote a Perl application that would search the ARP table of an Ethernet Routing Switch 8600 dynamically for a specific IP address entry. Here’s an example of the output;

Nortel Passport 8600 Gigabit Switch IP ARP Table Search

Initializing query for sw-ccr-8600.datacenter.acme.org for IP address 1.1.1.10...

sysDescr = ERS-8610 (4.1.3.0)
sysObjectID = .1.3.6.1.4.1.2272.30
sysUpTime = 169 Days 6 Hours 43 mins 11 secs
sysContact = Acme Network Infrastructure Team
sysName = sw-ccr-8600.datacenter.acme.org
sysLocation = USA

Please be patient it may take a while to complete the search...

DEVICE FOUND

1.1.1.10 (000AE4753FC9) address found on MLT SMLT-5500

We searched through 1183 forwarding records...

That's all folks!

I will look to publish the complete code on my website sometime in the near future.

Cheers!

WISP/CAPWAP Protocol (Ethereal)

Michael McNamara — Sun, 23 Dec 2007 03:00:00 +0000

While writing the previous article I recalled all the problems I had trying to decode the Motorola (formerly Symbol) WISP, WISPe, CAPWAP protcool used between the Wireless LAN Switch and their Access Ports.

As of WireShark version 0.99.7 there is decode support for the Lightweight Access Point Protocol (LWAPP) protocol used by Airspace (Cisco) and a few other wireless vendors.

The legacy Motorola Wireless LAN WS5000, WS5100 switches (version 1.x and 2.x) utilize the WIreless Switch Protocol (WISP) while the Motorola Wireless LAN WS5100, RFS7000 (version 3.x and 1.x respectively) utilize the WIreless Switch Protocol Enhanced (WISPe). The WISPe protocol from Motorola very closely mimics the Control and Provisioning of Wireless Access Points (CAPWAP) that is currently being developed by the IETF.

Now that I’ve got that history lesson out of the way. Have you every needed to decode the protocol running between the Wireless Switch and the Access Ports?

As you know by now I have a large number of Motorola Wireless LAN switches and Access Ports deployed throughout my organization. Unfortunatley the latest version of WireShark does not support the decoding of WISP, WISPe, or CAPWAP.

Thankfully Ethereal v0.10.14 has decoders for the WISP and CAPWAP protocols. I will say this warning though. I have downloaded multiple copies of Ethereal v0.10.14 and some seem to support WISP and CAPWAP while others don’t appear to support it. If I find a link for a working version I’ll update this article.

Here’s an example of the WISP protocol between a Motorola Wireless LAN Switch (WS5000 v2.x) and an Access Port 300 (AP300). (click on the image to enlarge it)

In the above trace you can see that the AP300 has just been reset and is in the process of booting. It starts by issuing EAPOL and LLDP packets before sending it’s first WISP “Hello”. You can see that the WS5000 responds to the “Hello” with a “Parent” command after which the Ap300 starts to download its runtime software with the “LoadMe” command.

Here’s an example of the CAPWAP protocol between a Motorola Wireless LAN Switch (WS5100 v3.x) and an Access Port 300 (AP300). (click on the image to enlarge it)

Note: this trace was not performed at the port level so we don’t see the EAPOL or LLDP traffic. We can see the AP300 making “Discovery”, “Join” and “Cfg” requests of the WS5100 switch.

Cheers!

UPDATE: March 29, 2008

Here’s a link for Ethereal v0.10.14 that I believe should decode both WISP and CAPWAP;

http://www.michaelfmcnamara.com/files/wisp-ethereal-setup-0.10.14.exe

UNISTIM Protocol (WireShark)

Michael McNamara — Sat, 22 Dec 2007 03:00:00 +0000

The folks behind WireShark have released version 0.99.7 for Windows. WireShark (formerly Ethereal) is the de facto standard network protocol analyzer today. I personally use WireShark and WildPacket’s OmniPeek depending on the situation or scenario.

Why the excitement behind the new release?

Well for those of us that have tried in vein for many years to decode the UNISTIM protocol the latest release of WireShark promises to deliver us from our purgatory. The complete release notes can be found here. I’ll include just the pertinent part here;

New Protocol Support

ANSI TCAP, application/xcap-error (MIME type), CFM, DPNSS, EtherCAT, ETSI e2/e4, H.282, H.460, H.501, IEEE 802.1ad and 802.1ah, IMF (RFC 2822), RSL, SABP, T.125, TNEF, TPNCP, UNISTIM, Wake on LAN, WiMAX ASN Control Plane, X.224,

You can find a entry for UNISTIM on WireShark’s Wiki here along with an entry on Wikipedia here.

In summary UNIStim is Nortel’s proprietary VoIP signaling protocol between their Internet Telephones (i2002,i2004,i2007,1120e,1140e,1150e) and the Nortel Call Server (PBX) switch. The Internet Telephones and Call Server still utilize the Real-time Transport Protocol (RTP) for the actual voice path between two Internet phones or from a Voice Gateway Media Card (VGMC) to an Internet phone.

Let me provide an example of the new decode; (click on the image to see it blown up)

This trace was taken by mirroring the port connecting to an i2004 Internet Telephone. In the trace you will see that the top frames are a RTP stream between the i2004 (10.101.245.132) and a VGMC (10.117.240.43). The frame I’ve highlighted shows the Signaling Server (10.101.240.20) sending a UNIStim signal to the i2004 to close the audio channel. You can see that in the next packet the far end (10.117.240.43) has already closed the TCP socket generating an ICMP unreachable message back to the i2004 phone.

Many thanks to Gerald Combs and all the contributors over at WireShark!

Cheers!

SNMP MIBS

Michael McNamara — Mon, 26 Nov 2007 16:00:00 +0000

I know what a pain it can be to sometimes locate vendor specific SNMP MIBS. In the past I’ve sometimes spent hours scouring the net and vendor sites looking for the MIBS.

I’ve decided to post some of the vendor specific SNMP MIBS that I work with on my homepage. You should be able to link straight to my homepage with this URL;

http://blog.michaelfmcnamara.com/mibs/

You should be able to find SNMP MIBS for the following devices;

Nortel Ethernet Routing Switch 8600 (v4.1.4)
Nortel Ethernet Routing Switch 5500 Series (v5.1)
Motorola WS5100 Wireless LAN Switch (v3.0.3)
Motorola RFS7000 Wireless LAN Switch (v1.x)
APC UPS Management Cards (v387)

As time and disk space allow I will add additional vendor MIBS and additional devices.

Update 12/01/07

Polycom VXS8000 Video Conferencing System
Blue Coat ProxySG Appliance
Blue Coat ProxyAV Appliance

Update 12/07/07

Nortel Application Switch (v23.2.3.1)

Update 12/26/07

Nortel Ethernet Switch 460/470 (v3.7)
Nortel Ethernet Routing Switch 1600 (v2.1.4)
Nortel Succession Call Server (v4.5)

Update 12/29/2007

Motorola WS5000/WS5100 Wireless LAN Switch (v2.1.3)

Cheers!