This is yet another example of AT&T failing a large enterprise customer. While this post has nothing to-do with all the recent hubbub around AT&T’s new “5G E” marketing campaign it highlights the continued challenges enterprises have in dealing with large telecom carriers that either just don’t care or simply just don’t have the ability to operate and manage large scale networks effectively.
The incident started as most incidents start… a call from the Help Desk alerting that there was a lot of red on the network dashboard and e-mail alerts were flowing in by the hundreds. In this case I got a call from one of my network engineer’s informing me that the primary AT&T AVPN/MPLS link into our primary Data Center was down and had been down for almost 60 minutes. That was very unwelcome news as it would significantly impact a large portion of our user base and a number of business critical applications.
While AT&T was supposedly “testing” the circuit my team and I went about re-routing traffic through a secondary Data Center that was still connected to the AT&T AVPN/MPLS network. From our secondary Data Center traffic could flow on a dedicated WAN link to our primary Data Center. That effort of dealing with BGP and EIGRP route maps and policies took about 2 hours to get the majority of traffic re-routed and working again although the re-route was adding about 140ms round-trip time to every IP packet as it needed to traverse the US West coast instead of the US East coast. We have firewalls all throughout the WAN layer so asynchronous routing will cause all sorts of issues and problems and since we also have some DMVPN sprinkled in there some care and planning was needed to successfully re-route traffic.
At the 3 hour mark AT&T had declared that the circuit was good and that there were no issues found on the circuit. The technician explained that we should “verify power”. At the 7 hour mark AT&T would be telling us that their last mile provider, Verizon, had de-provisioned the 1Gbps transport, and that was the cause of our outage.
Thankfully Verizon was able to re-provision the circuit within 20 minutes. Although it would take AT&T and Verizon another 9 hours before they could commit that the circuit wouldn’t be “automatically” de-provisioned again the following night.
I truly miss the days of un-managed dark fiber where all I needed to worry about were fiber breaks and my own gear… while we had a number of fiber breaks they were fairly infrequent and in the majority of cases there were quickly remedied within 2 hours – I can’t even get a call back from AT&T in under 2 hours, forget about resolution in under 2 hours.
What story do you have to share regarding any telecom carriers?
Cheers!
Patrick says
Hi Michael,
I can recommend Orange Business Services as a very good service provider especially their Business VPN products with managed CPE. In this case, if you have a problem, they just asked to check the cabling/power and LED status and for a reboot of the CPE. If the problem then still exist their get in contact with their partners and solve the issue quickly (quiet often ;). What I really like about OBS is their customer oriented work attitude. One of the best B2B relationship I had in the last 15 years of working in IT.
Best Regards,
Patrick
Michael McNamara says
Thanks for the recommendation Patrick.
Avery Abbott says
I feel your pain. More than once I’ve lost a VPLS WAN circuit due to a miscommunication between Big CenturyLink and LEC CenturyLink. It’s hilarious when CenturyLink says that CenturyLink is to blame, except that my circuit is still down…
Michael McNamara says
We have the same issue with AT&T… AT&T East blaming AT&T West (SBC), or having AT&T IP transport team blaming the LEC when SBC (AT&T) is the LEC.
Thanks for the comment Avery!
Tom says
Had this happen last? (it all runs together at this point) year. We have a T1 that handles a backup DNS server. Got an alarm that connectivity was lost with the server. It’s an AT&T Managed Service connection that uses Verizon for the local loop on our end. Turns out that Verizon just de-provisioned the circuit. It took 2 days to get it operational.
Del Bullion says
I am regularly asked to recommend a service provider. Whenever I am asked who do I like my response is always, “Whoever I am not fighting with today.”