What is VLACP and why do I need it? That’s a question that I see being asked quite frequently and not all the answers are correct. I’m hoping to answer this question once and for all and explain the rational behind the protocol and describe some of the issues and difficulties that VLACP helps address.
What is VLACP?
VLACP is an Avaya proprietary protocol used to detect end-to-end failures. VLACP takes the point-to-point hello mechanism of LACP and uses it to periodically send hello packets to ensure end-to-end reachability and provide failure detection across a Layer 2 network. When Hello packets are not received, VLACP transitions to a failure state and the port will be brought down.
Why use VLACP?
We know that auto-negotiation supports both RFI and FEFI so if our interfaces are configured for auto-negotiation those mechanisms will protect us by detecting a link failure scenario. If auto-negotiation is not available (Ethernet Switch 470 GBICs) then VLACP can help detect link failures and prevent frames being mistakenly transmitted into oblivion when the far end is down. If we are using a LAN extension or Q in Q solution link will only prove that we have connectivity to the edge of the network. VLACP will flow across the entire carrier network verifying that we have connectivity end to end across the entire Layer 2 network.
Why is VLACP so important in an IST/SMLT configuration?
Well we know that failure detection is a big issue and something that a lot of vendors work towards refining. You have a link failure or core failure and the network quickly converges and re-routes traffic so there’s no or limited disruption of traffic. Let me ask you what happens when that link or core recovers?
Let me take the example of a core switch failure as an example which should easily make the point. When that switch starts to recover we don’t want to immediately start accepting packets from edge/distribution switches in the network until we can re-establish the IST link. Let’s say we also want to have a complete routing table populated too before we start accepting packets. We need a way to bring up the port to the edge/distribution switch but we also need to let that switch know that we’re not yet ready to bridge/route traffic. VLACP answers that problem by allowing the link to establish and sending VLACP PDUs to the far end switch telling it not to start forwarding frames until we’re ready to receive them.
Avaya has been working to refine how VLACP works and released a significant improvement in March 2011 which is available in the following software releases;
- Ethernet Routing Switch 8600 v5.1.4 (or later)
- Ethernet Routing Switch 5000 Series v6.1.6 (or later)
- Ethernet Routing Switch 4000 Series v5.5 (or later)
Here’s a blurb on the change from the ERS 8600 release notes;
VLACP HOLD Enhancement
During SMLT node failure scenarios, traffic loss may be observed in certain scaled SMLT configurations with hundreds of SLTs, hundreds of ports and tens of VLANs. The root cause for the traffic loss was that the ERS8600 ports would come up prematurely at the physical layer causing the remote end to start sending traffic toward the ERS8600 that just came up. On the ERS8600 that just rebooted, the communication between the line cards and the CP may take several seconds in such scaled configurations. This resulted in black-holing the traffic arriving on such ports which were physically up but all operational configuration was not yet performed on those ports by the CP. The VLACP SUBTYPE HOLD feature introduces a new VLACP PDU with a new subtype HOLD to help reduce traffic loss in such scenarios.
The goal of this new implementation is to “hold down” all VLACP enabled links for a specific period of time after a reboot. This prevents remote VLACP enabled devices that understand the new VLACP HOLD PDU from sending data to the ERS8600. This will ensure that all VLACP enabled ports on the ERS8600 have had sufficient time to come up with all operational configuration and are ready to receive and forward the ingress traffic.
ERS8600 switches with 5.1.4.0 release are capable of both sending and receiving VLACP HOLD PDUs. Future code revisions of the Baystack switch family will support receipt and processing of VLACP HOLD PDUs, but will not generate them. Please refer to the applicable product release notes for information regarding product specific software levels required for support of this VLACP enhancement. VLACP is an Avaya proprietary protocol and hence this enhancement in not applicable when connecting to switches from other vendors.
By default, the VLACP HOLD feature will be disabled. The feature is enabled by configuring a positive value for VLACP HOLD Time. The VLACP Hold Time value configured should be selected based on the specific recovery implementation requirements, size and recovery characteristics for your network implementation.
How do you configure VLACP?
Since VLACP is a Avaya (formerly Nortel) proprietary protocol you can only configure VLACP on a point to point link between two Avaya switches. In a scenario where you are utilizing a carrier TLS (Transparent LAN Services) the two switches at the ends of the network need to be Avaya but the switches in the carrier network can be from any manufacturer so long as they forward the Layer 2 VLACP PDUs through the network.
Here’s a quick example of how to enable VLACP on a DMLT (Distributed MultiLink Trunk) between an edge Ethernet Routing Switch 5000 and an Ethernet Routing Switch 8600;
Ethernet Routing Switch 5520
interface fastEthernet 1/48,2/48
vlacp port 1/48,2/48 timeout short
vlacp port 1/48,2/48 timeout-scale 5
vlacp port 1/48,2/48 fast-periodic-time 500
vlacp port 1/48,2/48 enable
exit
vlacp macaddress 01:80:c2:00:00:0f
vlacp hold_time 20
vlacp enable
Ethernet Routing Switch 8600 (NNCLI)
config ethernet 3/1,4/1 vlacp fast-periodic-time 500
config ethernet 3/1,4/1 vlacp timeout short
config ethernet 3/1,4/1 vlacp timeout-scale 5
config ethernet 3/1,4/1 vlacp macaddress 01:80:c2:00:00:0f
config ethernet 3/1,4/1 vlacp enable
config vlacp hold-time 20
config vlacp enable
Is VLACP right for me?
If you are running a pair of Avaya Ethernet Routing Switch 5000 or 8600/8800s in a switch cluster then you should most definitely be utilizing VLACP. If you are running a multi-vendor network then VLACP might not be possible since it’s an Avaya proprietary protocol. If you are running a simple flat network with MLT or DMLT links between Ethernet Routing Switch 4000 and 5000 series switches then VLACP might not provide a whole lot of value assuming you are running auto-negotiation and have RFI and FEFI capabilities.
There have been a number of issues with VLACP over the past few years but a great many of those have been resolved in later software releases. If you have hundreds of interfaces running VLACP you can run into scaling issues depending on the CPU/SF that you have in your Ethernet Routing Switch 8600. If you stick with the recommended short timer value of 500ms with a value of 5 retries you shouldn’t have any issues. Yes that equates to 2.5 seconds of time before an interface is mark down by VLACP but that’s a value that most peripherals should be able to tolerate including Avaya’s IP telephony. You can be more aggressive with the retry count but you might end up missing VLACP polls and have interfaces marked down when they actually aren’t down.
I’m running VLACP at every site I have now for the past 3 years and have had very few problems. It’s actually saved me on a number of occasions because the Ethernet Switch 470 48Ts that don’t support auto-negotiation on the GBICs and VLACP has been able to detect the problem and mark the link as down allowing the traffic to flow over the remaining link(s) with no interruption to user traffic.
Are you running VLACP?
Cheers!
References;
Martin Sebek says
Hello,
I’ve just found out that vlacp does not work between ERS-8010 (7.1.3.0) and ERS-5510 (6.2.4). If I downgrade 5510 to sw version 5.0.0.011 it works like a charm. Even the newest 5.x software (5.1.5) has the very same problem.
Michael McNamara says
Hi Martin,
I’ve not heard of any issues myself. I will try and test your scenario and let you know what I find.
Cheers!
Scott Meadows says
I’ve noticed that sometimes when fiber gbics fail, they will continue to sent light to the far end but not receive correctly. In this condition, the far end will keep the link up and continue to send traffic creating a “black hole” effect. Vlacp has helped to eliminate this situation, but I don’t understand why it is needed if each end is directly connected with a piece of fiber with no other devices in the middle? I would think the RFI and FEFI capabilities that you refer to would take care of the problem. This problem was especially frequent on Avago 10gig gbics, but we have since removed all of those per Avaya recommendation.
Michael McNamara says
Hi Scott,
In some cases the GBICS/SFPs don’t support auto-negotiation. Without auto-negotiation there is no RFI/FEFI so then you’re left with the situation you describe.
As an example the original Ethernet Switch 470 doesn’t support auto-negotiation on the GBIC ports so you must lock the speed and duplex at 1000Mbps full duplex. Without VLACP you would (and I have) run into the scenario you described above.
I’m guessing the Avago optics don’t support auto-negotiation or RFI/FEFI?
Cheers!
ibrahim says
i have three tier network – Pair ERS 8600 in CORE and Pair JUNIPER EX4200, Can i VLacp ? is Vlacp is only in IST Port and SMLT port on nortel Side ?
Thx
Michael McNamara says
Hi Ibrahim,
VLACP is an Avaya (formerly Nortel) proprietary protocol so you can only use it on links between two Avaya switches. You can’t use it connecting to a Juniper EX4200.
Cheers!
Theo says
Hi Michael,
we are running VLACP. As you mentioned the ES-470 GBIC ports do not support autonegotiation and therefore no RFI, additionally this is the case for all ERS-8600 GBIC slots when using copper GBICs (fiber is ok, SFP slots are ok, too). But there are additional benefits. Autonegotiation only checks whether one Ethernet circuit sees the other, VLACP checks whether one CPU can reach the other, this has saved us in cases where the ports remained up, but switching didn’t work anymore because of hardware failures or software lock-ups. Even without the “hold” feature it can help when switches are rebooting, with an ES-470 for example the ports come up about 40s before switching works.
Cheers!
Michael McNamara says
Thanks for the comment Theo, you make great points.
Cheers!
Jan Hugo Prins says
I’m wandering about something. I just read this article and think this is really something I need. But now, I currently have an interconnect between 2 data-centres and this is running on a DWDM ring and I use a regular MLT for this. I have seen the blackholing problem with my 10G optics and I want to introduce VLACP.
When I configure this, how do I do this without loosing connectivity half way the configuration? I imagine that, the moment you configure it on site A, the other side looses connectivity because it doesn’t know about it yet.
Thanks for your nice blog posts.
Michael McNamara says
Hi Jan,
In previous releases VLACP would only mark the far side as down when the VLACP timeout expired so if you had your timers set to 500ms and 5 retries, you’d have 2.5 seconds before the port would be blocked by VLACP. The behavior of VLACP recently changed such that the port will immediately start blocking traffic until it receives a valid VLACP frame. So it’s now advisable to shutdown/disable the port prior to making any changes. Assuming you have some redundancy and alternate L2 paths you could allow your traffic to traverse your alternate paths while you are making the configuration changes.
Good Luck!