I ran into an interesting problem this past week after I made a fairly small change on one of my border BGP routers. We upgraded one link from a 100Mbps circuit to a 1000Mbps circuit and it was decided that we should use this link as our preferred path for all traffic egressing our network. We had previously been using a Comcast link as the preferred egress path for all traffic but we were going to change that using the BGP local-pref attribute. While those changes themselves were relatively straight forward and went off without a hitch there was an unintended consequence that stumped me for a few days. Upon making the change we received notification from our external monitoring servers that our Level3, Comcast and Verizon WAN IP interfaces had gone unreachable, previously there were reachable from our two external monitoring servers (Linux servers hosted in a VSP on opposite coasts of the United States). The alarm was a surprised but when I checked the Cisco ASR1001 interfaces everything was up and running although sure enough the two Linux servers were unable to ping the WAN IP interfaces on the border router for the Level3, Comcast and Verizon circuits. The two Linux servers were able to ping the WAN IP interface of the AT&T circuit. If I issued a ping from the Cisco ASR 1001 itself it had no issues pinging the Linux servers. If I tried to ping the two Linux servers from the router by sourcing the traffic from the previously mentioned WAN IP inetrfaces that would fail as well. That was odd I thought, what was going on there? Prior to this change the BGP local-pref preferred the Comcast circuit for all outbound traffic as visualized below.
Once we made the BGP local-pref change all IP traffic was egressing the AT&T circuit as visualized below.
There was never a problem reaching any of the ARIN IP address blocks that we were advertising via BGP the problem was isolated to just the WAN IP interfaces of the other Internet Service Providers.
The problem turned out to be that traffic to the WAN IP address was ingressing the circuit that the IP address was assigned to but it would egress the AT&T circuit due to the BGP local-pref statement. I’m guessing that AT&T is filtering the traffic on ingress checking for traffic sourced from an IP address block that has no business coming from that link and was dropping the traffic.
So an ICMP packet to the Comcast WAN IP address would ingress the Comcast interface and would egress the AT&T interface with a source IP address of the Comcast WAN interface. That packet would hit the AT&T head-end router which would discard any packets not sourced from the a valid ARIN IP address block belonging to that link, similar to Reverse Path Forwarding. I was able to verify this by placing a pair of static routes on the router using the Comcast circuit as a return path and with that the two Linux servers were now able to ping all the WAN IP interfaces. I’m guessing that while AT&T does some source route filtering, Comcast isn’t doing any.
It think it’s great that AT&T is filtering their inbound traffic for valid source IP blocks, it definitely helps prevent IP spoofing.
The confusion came when I did a debug ip icmp and later a debug ip packet 100 detail and observed no ICMP traffic coming from either of the two Linux servers on the Cisco ASR1001. I had a ticket open with Cisco TAC and they were also unable to explain the oddity. I’m curious if this was something to-do with CEF and might I need to enable no ip route-cache on the specific interfaces?
Cheers!