We had an odd problem over the weekend… a recently installed Opengear ACM7004 started intermittently dropping off the internal network, interestingly enough this Opengear was also connected to the public Internet and was having no issues on that NIC, so we had to do some basic troubleshooting from a reverse SSH tunnel – no Web user interface.
I wanted to-do some basic troubleshooting;
- Is there LINK?
- What speed and duplex are we auto-negotiating?
- Any errors on the switch side or host side?
There are a few different tools in Linux to help troubleshoot basic network connectivity issues, ifconfig, netstat, ethtool, and ip are among the top of the pile.
$ ethtool eth0 Settings for eth0: Supported ports: [ TP MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised auto-negotiation: Yes Speed: 10Mb/s Duplex: Half Port: MII PHYAD: 0 Transceiver: external Auto-negotiation: on Cannot get wake-on-lan settings: Operation not permitted Link detected: no
In the above output eth0 is down, in the below output eth0 is up. The NIC appeared to be bouncing up and down intermittently on and off for no apparent reason.
$ ip link 1: lo:mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth0: mtu 1500 qdisc mq state UP qlen 532 link/ether 00:13:c6:aa:bb:cc brd ff:ff:ff:ff:ff:ff 3: eth1: mtu 1500 qdisc mq state UP qlen 532 link/ether 00:13:c6:aa:bb:cc brd ff:ff:ff:ff:ff:ff
Ok, so I was specifically interested in eth0 and at the time it was reachable (ie. working) so I had a look at the Juniper EX4300 switch and found just a few issues;
show interfaces ge-1/0/4 extensive | match error
Link-level type: Ethernet, MTU: 1514, MRU: 0, Speed: Auto, Duplex: Auto, BPDU Error: None, MAC-REWRITE Error: None, Loopback: Disabled,
Input errors:
Errors: 10, Drops: 0, Framing errors: 10, Runts: 0, Policed discards: 0, L3 incompletes: 0, L2 channel errors: 0, L2 mismatch timeouts: 0,
FIFO errors: 0, Resource errors: 0
Output errors:
Carrier transitions: 1399, Errors: 0, Drops: 0, Collisions: 0, Aged packets: 0, FIFO errors: 0, HS link CRC errors: 0, MTU errors: 0,
Resource errors: 0
CRC/Align errors 10 0
FIFO errors 0 0
There were 1399 carrier transitions (the port had bounced up and down 1399 times). So that immediately told me there was definitely a problem somewhere. The CRC/Align errors could be a result of the port bouncing so much. I was able to quickly correlate the logs from the Juniper switch to the monitoring system. The monitoring system was loosing connectivity to the Opengear when the switch port was going down – which is obviously expected. So this was essentially a physical Layer1 problem – perhaps a cabling issue?
1000BaseT requires all 8 wires in order to make a connection, 100BaseT only requires 4 wires to make a connection, so I changed the Juniper switch to auto-negotiate at 10Mbps or 100MBps, and not 1Gbps and the port immediately connected.
set interfaces ge-1/0/4 speed auto-10m-100m
I’m going to guess that we have a bad patch cable between the Juniper EX4300 and the Opengear ACM7004, but for now we can run the Opengear at 100Mbps without an issue.
Cheers!