I recently read Tom Hollingsworth’s post titled “Is It Really Always The Network?” and immediately thought, I need to find time to post a reply. I’ve been in this industry for more than 20 years now and it has been a constant struggle to educate and train those around me to perform their due diligence before issuing the knee-jerk response that “it must be the network.” If you don’t understand the problem then ask for help, don’t pretend to understand the problem and then assign blame when you have no idea what you are talking about. I suspect people are getting worse and not better, although I almost wonder if people are getting lazier and just want someone else to fix their problems. Last week I had two examples of people throwing sh*t over the wall without the even performing the most basic troubleshooting steps and as you can already guess I was pissed.
In the first example, a SOAP/XML interface to a third-party wasn’t accepting transactions. Well that’s proof positive that it’s a network problem, right? A simple “telnet” test to the IP address and port from the origin server was successful. Yet the response from the team reporting the problem? “If it’s not the network we don’t know what to-do next” which left me speechless. After a 15 minute conference call, I had the third-party restart their back-end service and magically transactions started flowing again.
“If it’s not the network we don’t know what to-do next” 15 minutes later “oh, we restarted the back-end service and it’s working now”. Doh!
— Michael McNamara (@mfMcNamara) January 17, 2017
In another example, I was notified that a SOAP/XML interface that we host on the public Internet was inaccessible – must be a network issue. I verified that I was unable to connect to that specific host from the internal network on the TCP port specified and advised that they should check the specific host in question, I was rebuked by the application analyst telling me, “Nginx is up and running”. A co-worker remotely connected to the specific server and found Nginx prompting for the passphrase (password) for the SSL certificate that Nginx was trying to load. The team in question had never even checked the server.
It’s one thing to say ‘I don’t know what’s going on here and can you help me”. It’s a completely different thing to say “it’s a network problem, you need to fix it”, especially when it becomes abundantly clear that the team/person making this statement has done zero troubleshooting or due diligence and doesn’t even understand the details of the problem.
Occasionally we run into a genuine problem, yes they do occur. Unfortunately I’ve heard too many crying wolf stories that I almost never trust what I hear until I verify all the details myself.
How about this, I’ll take your credit card (Visa, Mastercard, Discover, no Amex) and if it’s a network problem I won’t bill you. However, if it’s not a network problem I’ll be sure to make it painful enough that the next time you’ll definitely do your homework before you come calling.
I’ll close this post out with the following line,
It’s Not Always the Networks Fault!
Garrett Michael Hayes says
My favorite is when “it’s your network” is the answer to a problem observed from 4 different networks on 4 different ISPs with 4 completely different architectures, all referencing the same target service. But of course, said target service couldn’t possibly be the problem, could it?
Jamie Darville says
Unfortunately the end user by default always blames the network thus management always blame the network and it gets escalated straight to level 3, in our environment it’s 90% of the time a storage related problem.
I find myself always having to do a dd test to each filer just to prove that there is no network issue, and show “NAS system A is getting full line speed read and write, however NAS system B is getting 1MB/s read and write access”