I’ve seen quite a few issues with VLACP on the Nortel Ethernet Routing Switches but now Nortel has released a technical bulletin documenting a known issue when running VLACP on their stackable switches (ERS2500, ERS4500, ERS5500, ERS5600) with their chassis based ERS8300 and ERS8600 switches.
The bulletin advises users to re-configure the VLACP timeout from a default value of 3 to 5.
5520-48T-PWR(config)#interface fastEthernet 5-6 5520-48T-PWR(config-if)#vlacp timeout-scale 5 5520-48T-PWR(config-if)#show vlacp interface 5-6 =============================================================================== VLACP Information =============================================================================== PORT ADMIN OPER HAVE FAST SLOW TIMEOUT TIMEOUT ETH MAC ENABLED ENABLED PARTNER TIME TIME TYPE SCALE TYPE ADDRESS ------------------------------------------------------------------------------- 5 true true yes 500 30000 short 5 8103 00:00:00:00:00:00 6 true true yes 500 30000 short 5 8103 00:00:00:00:00:00
The bulletin also refers to a software fix in ERS 2500 v4.2.1, ERS4500 v5.2.1 and ERS5500/5600 v6.0.2 or later maintenance releases.
We really only use VLACP as a means of detecting FEFI when the switch equipment doesn’t support autonegotiation (example; Nortel Ethernet Switch 470 doesn’t support autonegotiation on the 1000Mbps uplinks).
Cheers!
Update: Friday February 13, 2009
It seems that Nortel has released software 6.0.3 for the Ethernet Routing Switch 5500/5600 series switches. This release is suppose to resolve the VLACP issues that were reported in the earlier bulletin. Here’s an excerpt from the release notes;
A feature enhancement (Q01645430) that changed the VLACP interoperability behavior with Passport 8600 was removed. For further details, please see the Technical Support Bulletin ID. 2008009238, Rev 1, published on 2008-12-12.
Cheers!
Tom says
This is good to hear. We’re the site (well at least one of the sites) that discovered this issue the first week in December. I lost some hair trying to figure out what the heck was going on. Changing the timeout does correct the problem of what best can be called “VLACP flapping”.
Michael McNamara says
Hi Tom,
Let me say “Thanks” for your efforts!
I’ve seen a lot of odd behavior from VLACP especially on the ES 460/470 and ERS 5500 series switches. I’ve been testing ERS 5500/5600 software release 6.0.1 and even with the VLACP timeout scale set to 5 we still see some occasional flapping of the ports.
Thanks for the comment!
Tom says
We ran into the same thing of seeing flapping even when the timeout scale is set to 5. For now, we have VLACP off between our ERS 8600 core and 4500s running v5.2.0.009 code. We have VLACP enabled on 4500s running 5.1.0.001 and no flapping at all.
Michael McNamara says
Where we didn’t have any issues with v5.1.2.x code on a specific ERS5530 stack we are definitely having issues with v6.0.1.x code on that same stack. As you did I had to disable VLACP entirely between the ERS5530 stack and the SMLT ERS8600 cluster.
Richard McGovern says
This is known issue introduced with 6.0.1.x code, and there is a Bulletin about this on Nortel Support web site, dated 12-12-08. Bulletin can only be seen by customer’s with support contact, that are logged into support pages. I would suggest anyone in this position sign up for automatic email notification of changes to support site information.
The solution for this situation is to either 1) set VLACP timer values to 500 msec time-out with scale of 5, or 2) wait for 6.0.3.x code on 5530. This same situation applies to other stackable products, which is also detailed in the Bulletin.
Tom says
Good news. There has been a ERS 45xx software release (5.2.1) that deals with the problems associated with the VLACP flapping issue. We will look at installing this during our upcoming scheduled maintenance window.
Release notes are here:
http://support.nortel.com/go/main.jsp?cscat=DOCDETAIL&id=828529&poid=18122
Michael McNamara says
I’ve updated the original post above… Nortel has released 6.0.3 software for the Ethernet Routing Switch 5500/5600 series switches.
GreenSkol says
I’ve just discovered this blog, and as a Nortel user (bad luck) I’m happy to find people with the same problems !
I still haven’t understand why the switch should compute the mean time between VLACP packets… The developpers didn’t notice that there were timers used by the protocol ?
Despite being really simple, VLACP is badly coded and doesn’t work properly between different equipement series (eg 5510 and 8600).
The main bug I’ve found (still not solved) is this one : VLACP / LACP is supposed to work like TCP three-way handshake or OSPF hello protocol.
Data should normally be flooded on the link after the handshake is done, when both switches have seen each other and (V)LACP is up.
Sadly, this is not the case with VLACP.
Try the following between a 8600 and a 5510 : setup a MLT with 2 links on which VLACP is activated, and use slow timers (set up at 20s).
Plug the 1st link, MLT should become up and data should flow correctly between the 2 switches.
Plug the 2nd link, and you’ll observe a 20s packet loss : the 5510 waits 20s before sending any VLACP packet, but the 8600 doesn’t wait and starts flooding data just after sending its first VLACP packet. Result : the 5510 drops all received packets during 20s.
Solution from Nortel : use short timers, which reduces the loss at 500ms. Thanks…
The more I work with SMLT/IST, the more I prefer RSTP.
It’s sad to see such good hardware and such bad software :-(
Michael McNamara says
Hi GreenSkol,
I’ll have to disagree with you and say that VLACP is a great tool to have when designing highly available and redundant networks. You obviously need to understand how VLACP works and how to configure it properly to take full advantage of the features it offers. VLACP is designed to be a very lightweight protocol to provide highly available networks for VoIP. The key here is “lightweight”. Nortel didn’t require you to upgrade the switch fabric, CPU, or chassis when they delivered this feature so the designers need to be conscious of the load they place on the CPUs and modules. Unless you have a design requirement for using long timers (IST links) you should use short times (recommended values are short timers at 500ms with 5 retries ~ 2.5 seconds).
With regard to your test… VLACP operated just as configured… you configured it with long timers (20 seconds) so it would take 20 seconds for it to detect a missed VLACP heartbeat, then depending on the number of retries you have configured it could take 20 seconds * retries to ultimately mark the port as VLACP down.
If you had any doubt about the future of SMLT/IST you only need look at Cisco’s new vPC technology. While Spanning Tree and Rapid Spanning Tree were/are great legacy protocols there so much to be said about running a Spanning Tree free topology.
I personally think the software feature set on the ERS 8600 is quite impressive and has really matured greatly over the past two years. While Nortel has definitely had it challenges (ERS 8600 software release 4.1.6.x still put chills down my spine) the last few software releases have been very stable and have provided additional value to the legacy hardware that’s already installed.
Thanks for the comment! Please feel free to make additional posts over at the forums; http://forums.networkinfrastructure.info/nortel-ethernet-switching/
Cheers!
GreenSkol says
Hi,
I didn’t say that VLACP et IST/SMLT are not good technologies, I just said they are “spoiled” by bad implementation and annoying bugs.
On VLACP long timers, we had to use them because short timers where incompatible with HA-mode on the 8600 (at least until 4.1.4.0) ! VLACP went crazy on CPU switch over (as all sub-second protocols) :-(
I agree with you : the latest 8600 code is really great, and now HA-mode works flawlessly on unicast traffic.
I’m now working on multicast traffic, and I can say that there’s still a lot of work to do on multicast and HA-mode !
Thanks for the link to the forum, there’s not much Nortel user forum on the Internet !
Richard McGovern says
SMLT/RSMLT/VLACP/SLPP/etc. have evolved and matured since the introduction of the SMLT Active/Active resiliency model 7+ years ago. Like any other innovation (Ex: STP/RSTP), it takes time to fine tune due to the number of configuration permutations, designs, etc. that can potentially be implemented; this has been a growing process. This is why it is important to follow the set of documented best design practices when implementing this robust resiliency model which the competition is trying hard at duplicating (they are still trying….), as well as still learning the potential bumps in the road.
I encourage everyone to review the following guide which our team of senior solution architects maintain to ensure you achieve the highest level of availability. https://support.nortel.com/go/main.jsp?cscat=DOCDETAIL&id=948343&poid=9015
I would also recommend to pay attention to the Converged Campus Solutions guides. There is now both a large and small campus versions posted to Nortel web, and a medium one will be posted soon. These all provide best practices recommendations which will ensure you get the best experience in using this key differentiating Nortel feature set.
Richard McGovern
ERS 8600 PLM, Nortel
Richard McGovern says
In regards to the HA comments, HA – High Availability, can be achieved in different ways. Some vendors have focused on BOX high availability and others have focused on network availability. Mathematically if one if familiar with SLA’s, network resiliency decreases the downtime risk factor more than box resiliency. Nortel chose to first focus on the network resiliency model and get to sub-second failover with such network designs. It is not to say device high availability isn’t important and if you are familiar with our roadmap customers see the constant evolution of device HA model with almost all protocols now running in HA mode. There are a few left, including multicast which is more complicated than unicast protocols, but we continue to make great progress on that front. As stated in my earlier reply, focus on best network design practices and it will achieve the level of availability any business requires. We have customers running HUGE multicast deployment following our best practices that have achieved not 5×9’s reliability but near 100% uptime for both unicast and multicast IP communication.
Michael McNamara says
Thanks for the comments Rich.
It’s great to see Nortel putting out these technical documents.
Cheers!
Michael Melanson says
Greetings,
Maybe you can provide me a link to a test plan for testing VLACP. We have an new install where the integrating is insisting on using VLACP. I have not had the best of experiences with other integrators. I see lots of good resource here for setting it up. What I am in need of is a way to really prove it out. any thoughts or suggestions. Please don’t tear my head off if I am sounding like an idiot, I just really do not get how it functions and its drawbacks if any.
Many thanks
Michael McNamara says
Hi Michael,
I don’t think I’ve ever bitten anyone’s head off on this site or on the discussion forums.
How do you test it? Well you could disable VLACP on one side of the connection. You could physically move the connection to a different port/switch to simulate link but without the VLACP PDUs the far end shouldn’t bring up the port.
You could do one of the two above to see it in a broken state.
You’ll find quite a few discussions around VLACP on the discussion forums, you can use this link to search Google for VLACP;
http://www.google.com/search?q=site%3Aforums.networkinfrastructure.info+vlacp
Cheers!
Michael Melanson says
Mike, That was a global comment, no offense was intended. My apologies if it was taken that way. I have been flamed at time for asking “DFQ’s” Dumb F Questions.
Over the years any time VLACP was put in, I eventually removed do to it being problematic. Usually as a result of poor implementation of the professional service provider.
In this case, it’s being “rammed down my throat”. I have a great relationship with our Avaya team. They are assuring me, this time it won’t keep bringing down the network. I need to test it thoroughly. Call it being overly diligent. and not wanting to get burned.
Thanks for the suggestions, If you have any more ideas feel free.
I am hoping to come up with a more global test plan for VLACP for future use as well
vimal says
I have one edge stack of ERS 5520(3 switches) and routing stack of 5510(2 switches)
difined MLT on either side having port no. 1/48 and 3/48(edge site) and 1/1 and 2/1 on routing stack .. it looks ok with MLT, when no vlacp difined.but when we enable vlacp the connectivity between them lost.
note :- they are connected with copper uplink
config
vlacp enable
vlacp macaddress 180.c200.1100
interface FastEthernet ALL
vlacp port 1/1-3 timeout short
vlacp port 1/1-3 timeout-scale 5
vlacp port 1/4-24 timeout long
vlacp port 1/4-24 timeout-scale 3
vlacp port 2/1-3 timeout short
vlacp port 2/1-3 timeout-scale 5
vlacp port 1/1-24,2/1-24 ethertype 0x8103
vlacp port 1/1-24,2/1-24 funcmac-addr 0.0.0
vlacp port 1/1-24,2/1-24 fast-periodic-time 500
vlacp port 1/1-24,2/1-24 slow-periodic-time 30000
vlacp port 2/4-24 timeout long
vlacp port 2/4-24 timeout-scale 3
vlacp port 1/1-3,2/1-3 enable
no vlacp port 1/4-24,2/4-24 enable
!
vlacp enable
vlacp macaddress 180.c200.1100
interface FastEthernet ALL
vlacp port 1/1-47 timeout long
vlacp port 1/1-47 timeout-scale 3
vlacp port 1/48 timeout short
vlacp port 1/48 timeout-scale 5
vlacp port 2/1-48,3/1-47 timeout long
vlacp port 2/1-48,3/1-47 timeout-scale 3
vlacp port 1/1-48,2/1-48,3/1-48 ethertype 0x8103
vlacp port 1/1-48,2/1-48,3/1-48 funcmac-addr 0.0.0
vlacp port 1/1-48,2/1-48,3/1-48 fast-periodic-time 500
vlacp port 1/1-48,2/1-48,3/1-48 slow-periodic-time 30000
vlacp port 3/48 timeout short
vlacp port 3/48 timeout-scale 5
vlacp port 1/48,3/48 enable
no vlacp port 1/1-47,2/1-48,3/1-47 enable
exit
Michael McNamara says
You’ve provided some configuration but not the output of the operational commands?
“show vlacp interface 1/48,3/48”
These switches are directly cabled together? Nothing in-between them?
Cheers!