Michael McNamara

Migrating from GoDaddy to Porkbun

Michael McNamara — Fri, 21 Nov 2025 18:10:57 +0000

It’s been over a year since I posted here but I finally found the time to migrate away from GoDaddy. And I will once again be able to fully automate renewing my wildcard SSL certificates with LetsEncrypt while saving some $$$ as well.

This has been a long time coming… and while I didn’t have any really bad experiences with GoDaddy I’ve likely been leaving a good chunk of change on the table every year with the costs I’ve been paying for the few domains that I actually own.

I’ve also starting looking at OVHcloud, possible migrating away from Digital Ocean. Who knows maybe by the time you are reading this, this site will already have been migrated.

Cheers!

The post Migrating from GoDaddy to Porkbun first appeared on Michael McNamara.

Why isn’t the Let’s Encrypt wildcard automatically renewing? GoDaddy $%&@

Michael McNamara — Fri, 04 Oct 2024 20:51:14 +0000

I’ve been pretty busy with real life as I’m sure everyone is these days… over the summer you likely didn’t notice that the SSL certificate expired on this website. I eventually got around to manually renewing the Let’s Encrypt wildcard SSL certificate because I didn’t have time right then to dig into why my monthly cronjob wasn’t working properly. I realize I’m about 5 – 6 months late on this story but hey it’s my story for today.

It’s Friday and some much needed personal time off and since it’s raining outside I’m left to deal with anything that needs attention inside the house… having emptied all the mouse traps in the garage (that time of year here in Pennsylvania) and having already made my trip to the bank and to the DMV I’m left with digital maintenance – did I mention I built a new PC – no I didn’t did I, I really need to catch up on this blog.

Anyway, back to Let’s Encrypt and GoDaddy… upon digging into the code I find that the API call to GoDaddy is failing with the following message:

{"code":"ACCESS_DENIED","message":"Authenticated user is not allowed access"}

Interesting, let me see if GoDaddy expires the API key or secret like LinkedIn likes to-do, perhaps I’ll just regenerate them regardless. After a new API key and secret still no luck, even calling the API via cURL returns the same error message. A quick search of Google quickly reveals a few stories that cause some concern…

It would seem that GoDaddy removed access via their API for smaller customers? They probably notified me and I just missed the email message, after all I’m pretty busy. Hmm… nope they didn’t notify me, seven years of email archives and nothing from GoDaddy about them restricting access to their API. I do have a message from them in March of 2022 asking if it was me setting up the original API key and secret. Disappointing but that seems to be the trend for 2024, vendor after vendor and don’t get me started on the Private Equity mess. For the record I have 7 domains with GoDaddy and have been using them since 2007.

I think it’s time to let my money do the talking, even if it requires more of my personal time than I have to offer – it’s really the only voice any of us have.

What do you think?

Cheers!

The post Why isn’t the Let’s Encrypt wildcard automatically renewing? GoDaddy $%&@ first appeared on Michael McNamara.

Juniper EX4100-F-12P power supply failed?

Michael McNamara — Sat, 18 May 2024 17:48:43 +0000

We use a few Juniper EX2300C and recently EX4100-F-12P switches where we have a need. Interesting issue with the EX4100-F-12P, it appears that you can power it over PoE. However, if you power it from a standard power supply you’ll get syslog messages indicating that there is a power supply failure. Junos seems to think because the switch isn’t being powered by PoE that there’s a power supply failure.

Mar 18 16:20:55  EX4100F chassisd[17857]: CHASSISD_SNMP_TRAP6: SNMP trap generated: Power Supply failed (jnxContentsContainerIndex 2, jnxContentsL1Index 1, jnxContentsL2Index 2, jnxContentsL3Index 0, jnxContentsDescr Power Supply 1 @ 0/1/*, jnxOperatingState 6)
Mar 18 16:20:55  EX4100F chassisd[17857]: CHASSISD_SNMP_TRAP6: SNMP trap generated: Power Supply failed (jnxContentsContainerIndex 2, jnxContentsL1Index 1, jnxContentsL2Index 3, jnxContentsL3Index 0, jnxContentsDescr Power Supply 2 @ 0/2/*, jnxOperatingState 6)
Mar 18 17:20:56  EX4100F chassisd[17857]: CHASSISD_SNMP_TRAP6: SNMP trap generated: Power Supply failed (jnxContentsContainerIndex 2, jnxContentsL1Index 1, jnxContentsL2Index 2, jnxContentsL3Index 0, jnxContentsDescr Power Supply 1 @ 0/1/*, jnxOperatingState 6)
Mar 18 17:20:56  EX4100F chassisd[17857]: CHASSISD_SNMP_TRAP6: SNMP trap generated: Power Supply failed (jnxContentsContainerIndex 2, jnxContentsL1Index 1, jnxContentsL2Index 3, jnxContentsL3Index 0, jnxContentsDescr Power Supply 2 @ 0/2/*, jnxOperatingState 6)

We opened a ticket with Juniper and they believe it’s a flaw. Issue is that we monitor over 1,000 switches and we use the syslog feed to create alerts and tickets for review, now we’ve need to build exemptions into our logging to deal with these false positive alerts.

Hopefully Juniper will fix this bug.

Cheers!

The post Juniper EX4100-F-12P power supply failed? first appeared on Michael McNamara.

Issues with Palo Alto 10.2.x and GlobalProtect with SAML

Michael McNamara — Thu, 29 Feb 2024 04:34:21 +0000

We’ve been using Palo Alto’s GlobalProtect with Azure SAML successfully for the past 4 years. We have a single portal with multiple gateways deployed globally. We recently started upgrading our Palo Alto firewalls from 9.1.x to address the certificate issues and discovered that GlobalProtect broke when we hit 10.2.x. We were getting the infamous “Failed to get client configuration” error. The firewall was unable to determine the username to use for the LDAP query to get the group membership.

Ultimately we had to go back to our Azure SAML configuration and modify the username attribute such that the SAML response would return “domain\username” format.

Cheers!

Update: March 2, 2024

It’s turn’s out that prior to 10.2 the user domain was being learned from a certificate on the client. We issue certificates to all our devices as a second factor, third factor really when you think about MFA. I don’t believe Palo Alto has any intention on “fixing” the issue, hence you need to update your SAML attributes to return “domain/username” in the username attribute.

The post Issues with Palo Alto 10.2.x and GlobalProtect with SAML first appeared on Michael McNamara.

HPE/Aruba Activate goes rogue?

Michael McNamara — Sun, 22 Oct 2023 14:59:43 +0000

It’s been a while… just been busy like everyone else, doing my best to keep the ship moving while not capsizing. I thought I would take an hour here on a Sunday morning and tell you another story. It’s a cautionary tail about the cloud and what can happen when vendors they have hooks into your infrastructure.

We use HPE/Aruba Instant APs at many of our locations globally. A while back we had an interesting issue. We had a site reporting that their wireless was down and the team performing the initial troubleshooting reported that they were unable to log into any of the Aruba Instant APs or the virtual controller. I ended up taking the case myself and what I found was troubling. While the VC IP address was still responding to ICMP pings, it appeared as if our enter configuration was wiped and overlaid with a different configuration.

I would factory reset the IAP to get it back online and shortly after I loose access to it again once it contacted Aruba Active – I verified this via my firewall logs.

Ultimately I found that the IAPs appeared to have adopted a configuration from Aruba Activate – the cloud solution from HPE/Aruba to help solve zero touch provisioning and configuration. These IAPs were originally purchased by my organization and had no configuration in Activate but somehow someone else in Aruba Activate pushed a configuration to our IAPs? I never did learn the answer to who or how that happened but my HPE/Aruba sales engineer was extremely help working internally within HPE/Aruba to address the issue. For a short term solution I blocked access to the HPE Activate at my firewall and then had to factory reset and reconfigure all the Instant Access Points.

There is an option in Instant AOS 8.4.x and later that allows you to disable Activate.

activate-disable

Unfortunately this wouldn’t have worked for us as we’re still running 6.5.4.x on a large number of our IAPs.

Question: Do you know what really happens to your gear when that cloud subscription runs out?

Cheers!

The post HPE/Aruba Activate goes rogue? first appeared on Michael McNamara.

Juniper EX4400 Switch – LLDP missing

Michael McNamara — Thu, 19 Oct 2023 00:54:49 +0000

I recently stumbled into an interesting issue with the latest recommended release for the Juniper EX4400 switch running software release 22.2R3-S2.8. The LLDP table was missing the entries for the neighboring Juniper EX4650 switch that it was uplinked to.

Long story short it turns out that this is a known issue.

You need to add the following configuration statement to your adjacent switch, not the EX4400 itself but the switch on the “other” side of the connection

set protocols lldp tlv-filter cloud-connect-event

With that statement in the EX4650, the EX4400 would display the appropriate neighboring links in it’s LLDP table.

Cheers!

The post Juniper EX4400 Switch – LLDP missing first appeared on Michael McNamara.

Juniper EX4400 – Virtual Chassis not working

Michael McNamara — Tue, 17 Oct 2023 01:43:43 +0000

We made the jump from the EX4300 to the EX4400 this year and while things have been good, we’ve seen a number of bugs and issues with the early software releases.

If you run into issues with Virtual Chassis, my first suggestion is to check the software release.

By default, the QSFP28 ports on the back of the Juniper EX4400 should be setup as “Virtual Chassis” ports for stacking. You can issue the following command to change the configuration if needed;

request virtual-chassis mode network-port disable reboot

The issue I found is that ~ 70% of the time a Juniper EX4400 would fail to see the Virtual Chassis ports (and fail to “stack” properly) if it was running 21.2R3.8 software – the software release Juniper was shipping on switches sold in early 2023. An upgrade to 21.4R3-S3.4 or even the current recommendation of 22.2R3-S2.8 immediately resolves the issue.

I’ve also observed a number of odd PoE/interface issues impacting Juniper MIST Access Points, Kronos clocks along with other assorted PoE devices, such that they receive power but are unable to establish a LINK on the port with either 1Gbps or 2.5Gbps.

I’m currently running 21.4R3-S3.4 in production but we’re seeing a lot of intermittent BFD timeouts which we suspect is a software issue. We’re currently testing 22.2R3-S2.8 in a number of locations.

Cheers

The post Juniper EX4400 – Virtual Chassis not working first appeared on Michael McNamara.

HPE/Aruba ClearPass 802.1X auth fails with Android 11

Michael McNamara — Sat, 13 Aug 2022 13:34:04 +0000

This is another one of those “it must be the network” posts. It was an interesting problem to chase so I thought it worth the effort to post it here for anyone that hasn’t seen this problem before.

The trouble ticket came in as a brand new “out of the box” Motorola G Pure was failing to authenticate via RADIUS 802.1X to our wireless network using valid credentials. However, if you managed to get it the device connected via guest wireless and enrolled in Soti then it was able to authenticate via RADIUS 802.1X without an issue.

A quick review of the HPE/Aruba ClearPass instance showed an error code 215, a TLS session error. Which interestingly enough was reporting as an expired certificate, although this certificate error was on the client side which was odd giving that historically Android devices don’t validate or care about the RADIUS certificate.

The text of the error read as follows;

EAP-PEAP: fatal alert by client - certificate_expired TLS Handshake failed in SSL_read with error:14094415:SSL routines:ssl3_read_bytes:sslv3 alert certificate expired eap-tls: Error in establishing TLS session

It turns out I’ve seen this issue before with Android 10 but in that case the device was failing to open a captive portal page when connecting to a guest WiFi network because the SSL certificate securing the captive portal was “invalid” to the mobile. Why you ask? The device had the wrong date/time. And that’s exactly what’s happening here… although Android 11 is taking the issue a little further because it views the RADIUS certificate as invalid it’s not allowing the RADIUS 802.1X authentication to proceed.

The issue is the Motorola G Pure will boot up with a default date and time that appears to be related to date of that specific software build. In this case the default date was June 30, 2022 – fairly new I’d agree. If there is a SIM in the device it will pull the correct date/time from the cellular network, but if these are just being used on WiFi then they won’t automatically update their date/time until they are connected to a wireless network. Unfortunately we had just recently renewed our RADIUS certificate (publicly signed) on July 14, 2022. While the certificate hadn’t expired it wasn’t yet valid because the mobile had a date & time that was before the issue date of the certificate.

This wasn’t an issue in Android 10 because Android 10 didn’t validate the date of the RADIUS certificate, but Android 11 will attempt to validate the RADIUS certificate being used in the RADIUS 802.1X exchange. It should also be mentioned that you’ll need to make sure you have the “Domain” box filled in with the domain of the certificate used by the RADIUS server – that’s new with Android 11 as well.

Cheers!

The post HPE/Aruba ClearPass 802.1X auth fails with Android 11 first appeared on Michael McNamara.

Palo Alto PAN-OS 8.0 Upgrade Failure

Michael McNamara — Mon, 01 Aug 2022 16:00:00 +0000

It turns out that in the year 2022 upgrading from PAN-OS 8.0.x requires a TAC case and an older content update file that’s not readily available on Palo Alto’s Support website.. Hopefully this will save someone else the down the road.

I recently needed to press an older PA-220 that had been in laying around in a lab into a production environment due to the supply chain debacle that we’re all currently living in. I reached out to my reseller and had the firewall fully licensed and was able to apply those licenses to the hardware. In preparation of the deployment I tried to bring the PA-220 up to PAN-OS 9.1.14. And was unable to upgrade past 8.0.20 even with the device being fully licensed.

When I tried to upgrade from 8.0.20 to 8.1.23 I would get an error during the software install, “Failed to install 8.1.23 with the following errors. SW version is 8.1.23 Error: Upgrading from 8.0.20 to 8.1.23 requires a content version of 769 of greater and found 655-3816. Failed to install version 8.1.23 type panos“

Even though the device was fully licensed there were no Dynamic Updates available to download or install. I even tried to manually download them from the Palo Alto support website and install them and that was met with a different error when trying to commit the change.

I opened a case with Palo Alto Support and eventually they provided me content update 8424-6791 which I was able to manually install and apply, after which I was successfully able to upgrade to 8.1.23. I was then able to download and apply the latest and greatest greatest content updates from the webUI and eventually upgrade the firewall to 9.1.14.

Cheers!

The post Palo Alto PAN-OS 8.0 Upgrade Failure first appeared on Michael McNamara.

Retirement Planning – Personal Capital

Michael McNamara — Sun, 03 Jul 2022 14:28:15 +0000

I’m at that point in my life where I feel I need to start keeping a closer eye on my retirement investments and make sure that my wife and I will be ready when we decide to retire. I’m not intent on making a lot of changes or moving any investments around but I feel I need to be an educated investor and make sure that my retirement goals are on track. I recognize that writing this post in June of 2022 it is not the best time to want to start tracking your retirement investments, with the US stock market being down ~ 15.0% already this year and the possibility of a recession looming here in the United States.

Here are my personal numbers if anyone is interested, I probably need to re-balance as I’m over committed to US stocks. I’m in it for the long haul, so while the numbers above might be disappointing there isn’t a whole lot I’ll be doing about it right now, it’s better to just stay the course and continue to invest while the market in general is down IMHO.

I’ve been saving toward my retirement since my first job at Manhattan College, and yes, I still have my TIAA-CREF (403b) retirement account that I enrolled in back in 1995. The challenge is trying to manage all the various accounts that either myself or my wife have. That’s where I’ve found Personal Capital to be an incredibly useful (and free) tool. There are a ton of great reviews on Personal Capital out on the net so I’m not going to go into any depth here other than to just to say it’s an incredibly useful tool IMHO. If you don’t feel comfortable managing your own retirement accounts or need help I would strongly suggest you seek professional assistance from a CPA.

Cheers!

Personal Capital Referral Link: https://pcap.rocks/m32436

The post Retirement Planning – Personal Capital first appeared on Michael McNamara.

Elo Touch – 5Ghz Wireless (Channel Support?)

Michael McNamara — Thu, 12 May 2022 02:51:35 +0000

We had an issue a few months back with a number of Elo Touch all-in-one systems. These devices had been installed and working for almost three years and then literally overnight they started having issues connecting to our wireless infrastructure – all at the same time. Oddly enough the issue was only impacting the Elo devices, we had numerous other devices including Lenovo laptops, macOS laptops, Apple iPhones, Zebra TC20/TC21 Handhelds (Android), Zoom Conference TVs (Apple Mac Mini) all working without issues or problems. The initial troubleshooting didn’t turn up anything simple, there were no locked out accounts or other RADIUS 802.1X authentication issues. We just didn’t see the devices in question even trying to associate to any of the APs so we were initially stumped. While we worked to get an engineer onsite we performed the obligatory rolling reboot of the Cisco WLC 5520s (primary and standby) along with the Cisco AP 4800s (they had an uptime of just over 645 days) just to check that box for lack of any other direction at that time.

What was the issue?

In this specific facility we only use the 5Ghz band for our production networks, 2.4Ghz is setup for the guest network. In the end we determined (still waiting on confirmation) that the devices in question don’t appear to support all the 802.11a 5Ghz wireless channels. We found the following reference on several Internet websites.

Elo devices cannot operate on 5G wireless networks utilizing 5.250 to 5.350 GHz OR 5.470 to 5.725 GHz.

I didn’t know the frequencies off the top of my head so I had to look them up… thanks to the folks at Wireless LAN Professionals for the chart below. That potentially removes channels 52-64 and channels 100-144 from being used, only leaving channels 36-48 and I would have to guess the device likely doesn’t support the UNII-3 band and channels 149-165 so that’s super restrictive.

Credit: Wireless LAN Professionals

In a large fulfillment center it’s usually feast or famine, too much RF signal or not enough RF signal and it takes a lot of work to find that happy medium.

What happened?

It would appear that Dynamic Channel Assignment (DCA) on the Cisco WLC 5520 changed an AP from channel 48 to channel 136 the morning the issue started, found the log entry, and that was the only AP in the physical area around the clients that was using any of the channels between 36 and 48. In short the Elo devices were blind to the wireless access points around them because they were on channels that the devices didn’t support. This was later confirmed by performing some remote wireless packet traces from some one of the Cisco 4800 APs in sniffer mode. We captured numerous packet traces across numerous 5Ghz channels but we were unable to see any of the Elo devices communicating in any channel other than 36-48. We were looking for active probe requests in the wireless packet traces which is not fool proof as the client can still listen passively. We manually set the AP back to channel 48 and the devices immediately started working. We’ve temporarily disabled TPC and DCA while we try to validate what channels the device supports.

The Elo vendor reps we contacted claimed that the devices support all the “standard” 5Ghz channels but from the evidence we collected that doesn’t appear to be the case. I hope to be able to get my hands on one of these devices in the coming weeks to try and validate my suspicions.

I still need to confirm but this is really the only explanation that fits the available evidence.

Anyone else ever have such an odd problem?

Cheers!

Update: July 2022

I was able to get my hands on ELO and was able to verify that it could in fact communicate in the UNII-2a bands, so I’m not sure what to make of this issue with that new technical tidbit.

The post Elo Touch – 5Ghz Wireless (Channel Support?) first appeared on Michael McNamara.

Let’s Encrypt SSL Wildcard Certificate

Michael McNamara — Fri, 22 Apr 2022 17:42:47 +0000

In July of 2020 I wrote about the relative cheap cost of a standard SSL certificate from RapidSSLonline in an article titled, “Your certificate expires in 1 day!!!“. While standard SSL certificates were available for ~ $14.99/year at the time the cost of a wildcard SSL certificate is considerably more expensive than a standard SSL certificate. In December 2021 the wildcard SSL certificate that I use on this site was set to expire so I made the decision to try Let’s Encrypt.

I’m happy to report that it’s been an extremely painless adventure with the only caveat being that I had to manually renew the SSL certificate every 90 days. After some research I found that really isn’t an issue thanks to Martijn Veldpaus. Martin has written some scripts that help bring together certbot and the API calls to GoDaddy, I’m using GoDaddy as my domain registrar and as my DNS provider, to perform the DNS verification that’s required by Let’s Encrypt to prove that you own the domain.

I’m saving myself about $149/year by using Let’s Encrypt instead of a traditional Certificate Authority.

If you are a GoDaddy customer looking for an extremely easy way to setup the automated renewal of your wildcard SSL certificates with Let’s Encrypt I would strongly suggest you check out Martin’s github repository Certbot-Godaddy.

Cheers!

The post Let’s Encrypt SSL Wildcard Certificate first appeared on Michael McNamara.

Ansible Default Forks = 5

Michael McNamara — Fri, 15 Apr 2022 14:10:15 +0000

We recently starting using Ansible to help perform software upgrades on the large number of Juniper EX-4300 and EX-2300 switches in our environment. Like the vast majority of organizations our downtime windows are extremely short and unfortunately the element of human error is usually greater than the standard mean between 12AM and 6AM. Thankfully Ansible solves most of these issues and is very reliable. Out of the box, Ansible has a configuration default of 5 forks and as such it will only upgrade 5 switches at a time. If you are going to be working with any sizable number of devices you’ll need to update the configuration value in the ansible.cfg file.

[defaults]
inventory = inventory
host_key_checking = False
log_path = ~/ansible/ansible.log
forks = 30
timeout = 60

You’ll need to make sure that whatever server or virtual machine is running your Ansible instance can support the number of forks you configure.

Cheers!

The post Ansible Default Forks = 5 first appeared on Michael McNamara.

Raspberry Pi 4 Bullseye WiFi – Country Code

Michael McNamara — Tue, 15 Mar 2022 01:52:27 +0000

Raspberry Pi 4

I recently had the opportunity to setup a Raspberry Pi 4 in a headless configuration and ran into an interesting issue around the WiFi configuration with Bullseye.

When logging in via SSH the following text was visible at the bottom of the motd;

Wi-Fi is currently blocked by rfkill.
Use raspi-config to set the country before use.

It turns out that Bullseye will disable the WiFi driver in the kernel unless the country code is set.

This is really only an issue if you are using the Raspberry Pi in a headless configuration without the desktop GUI.

What’s the workaround?

I edited the wpa_supplicant.conf file in /etc/wpa_supplicant as follows, adding the US country code;

ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev
update_config=1
country=US

Then you need to copy the file /etc/wpa_supplicant/wpa_supplicant.conf to /boot and reboot.

sudo cp /etc/wpa_supplicant/wpa_supplicant.conf /boot
sudo init 6

When the Raspberry Pi booted back up… the wireless drive was loaded and I was able to connect to the intended wireless network.

Cheers!

The post Raspberry Pi 4 Bullseye WiFi – Country Code first appeared on Michael McNamara.

APC UPS NMC stops responding via HTTPS

Michael McNamara — Sun, 20 Feb 2022 15:22:36 +0000

Who doesn’t love a good mystery, I’m no exception. A few weeks back we had an interesting issue pop-up. It was midnight on a Sunday night and PagerDuty started firing off an alert that a UPS in one of our distribution centers had just stopped responding via HTTPS. The UPS was still online responding to both ICMP and SNMP traffic, so the alert was acknowledged and the alarm was paused until it could be reviewed the next day.

The UPS itself was fairly new having been installed just under a year ago. It was an Schneider Electric/APC 3000RT Smart-UPS with an AP9631 network management card. Interestingly enough we just had an issue with a brand new APC 8000SRT Smart-UPS with an integrated AP9537SUM network management card that had essentially started doing the same exact thing a few days earlier. Only that installation was only a few days old when it stopped working. Again ICMP and SNMP worked fine… as did HTTP (if you enabled it).

What was that all about?

After a few hours of troubleshooting and digging I discovered that the self-signed SSL certificate installed on the NMC had expired. Any attempt to connect to the NMC via HTTPS after that point would result in the socket getting immediately closed upon connecting by the NMC. Removing the self-signed SSL certificate and rebooting the NMC caused the self-signed SSL certificate to be regenerated and the problem was resolved. You can remove the SSL certificate by enabling the HTTP server via either SSH or TELNET (will depend on the age of your card as to which one is enabled by default), login in via HTTP go to Configuration -> Network -> Web -> SSL Certificate and select Remove and Apply. You just need to reboot the NMC and you should be able to connect via HTTPS.

1 Year?

The self-signed SSL certificate is only good for one year, after which you’ll need to regenerate it again. The latest version of the firmware/software (NMC2 – v7.0.4) from APC sets the expiration date for all self-signed SSL certificates out to 2035 – not sure if the web browsers will start to complain about that.

Ripple20?

If you haven’t already patched your APC network management cards it might be a good time to take care of that task as well. We had to patch all of our APC and Eaton network management cards that are used throughout our network.

Cheers!

The post APC UPS NMC stops responding via HTTPS first appeared on Michael McNamara.

PanOS 9.1.12 breaks GlobalProtect VPN

Michael McNamara — Thu, 03 Feb 2022 22:00:00 +0000

When possible it’s always a good idea to test any software upgrades, because you just never know what your going to get. That was the case recently when I upgraded our test PA-220 from 9.1.7 to 9.1.12-h3 and seemingly breaks all GlobalProtect VPN functionality. The portal doesn’t respond on TCP/443 at all, so it looks like the firewall itself is dropping the traffic.

The issue turned out to be Strict IP Address Check which was just “resolved” or enabled in 9.1.12.

AN-175934 Fixed an issue where packed-based zone protectio settings (such as
Strict IP Address Check) were not applied to return traffic.

When I disabled Strict IP Address Check on the zp_untrusted zone protection profile GlobalProtect started working again.

What is Strict IP Address Check?
Check that both of the following conditions are true:

The source IP address is not the subnet broadcast IP address of the ingress interface.
The source IP address is routable over the exact ingress interface.

If either condition is not true, discard the packet.

Looks like a bug to me.

Cheers!

The post PanOS 9.1.12 breaks GlobalProtect VPN first appeared on Michael McNamara.

AOL (Verizon) breaks Microsoft Outlook

Michael McNamara — Sun, 30 Jan 2022 13:20:10 +0000

What is going on with AOL and Microsoft Outlook?

I’m a Verizon FiOS customer and was migrated to AOL back in 2017. Within the past 30 days I’ve heard and seen a number of issues with people connecting to their AOL inbox from traditional email clients such as Microsoft Outlook, Thunderbird or even the native email clients on iPhone and Android.

The loving wife had this same issue and I wrongly assumed end user error. You would think I’ve learned by now to not jump to conclusions. It seems she’s not the only person with issues as there are numerous posts on numerous message boards all within the past 30 days with dozens if not hundreds of people reporting the same issue.

The general consensus is that:

Verizon/AOL accounts require an AOL “App Password” to be used as the password for the account configured in Outlook or in any email client (iPhone, Android, Thunderbird, Outlook, etc)

What’s more interesting is that AOL apparently is not blasting out this new feature to all users at the same time because my Microsoft Outlook 365 client continues to work fine while my wife and many others are having to generate an “app password” to get their email flowing again. Some of the posts suggest that if you’ve activated “2-step verification” on your AOL account that you’ll need to generate and use an “app password” to access your email from a legacy email client.

I did find the following article from AOL:
https://help.aol.com/articles/allow-apps-that-use-less-secure-sign-in

The article linked above suggests that AOL is actively blocking clients that it believes are less than secure. Is that because the client is passing the username/password in the clear (unencrypted) in a legacy POP3 connection and not using IMAPS or POP3S?

If your traditional email client stops working it might be more than just a password issue. You might want to try either upgrading your email client or setting up an AOL app password and see if that resolves your issue.

Sign in and go to the AOL Account security page. You can do this by signing on to AOL from a computer.
Click Generate app password or Manage app passwords.
Select your app from the drop down menu and click Generate.
Follow the instructions below the password.Be sure to enter the password into your app without any spaces.Click Done.
Use this app password and your email address to sign in to your email app.

Cheers!

The post AOL (Verizon) breaks Microsoft Outlook first appeared on Michael McNamara.

It’s never a DNS issue right?

Michael McNamara — Sun, 23 Jan 2022 22:02:02 +0000

I stumbled into an interesting issue today that gave me a smile when I determined it was a DNS issue.

I was doing some consulting work around WireGuard for a client, and noticed a number of odd issues and just general wonky behavior with everything being slow. This specific client uses Ubuntu Linux while I’m more of a RedHat/CentOS/Rocky guy so I thought it was an issue with the DNS caching that Ubuntu utilizes in systemd-resolve. A few quick tests using a Windows client proved that the issues weren’t limited to just the Ubuntu server, it was impacting every device. DNS queries were taking between 5 to 6 seconds and some were timing out entirely.

The client had mentioned some oddities and issues and I thought there might be a duplicate IP on the network – pretty standard affair in some networks. This wasn’t a duplicate IP issue so I went straight to the DNS servers themselves – Microsoft Windows Server 2019. I found that the root forwarders for each server were setup to use some very old Verizon DNS servers – and wouldn’t you know that some of them were no longer responding. I removed all the Verizon entries and added the two standard Google DNS servers – 8.8.8.8, 8.8.4.4. After applying that and restarting each DNS server the problem was gone and everything was running smoothly again.

What do you use for your DNS forwarders? Or do you rely on the root hints file maintained by Internic?

Cheers!

The post It’s never a DNS issue right? first appeared on Michael McNamara.

HPE/Aruba Instant Access Points – mixing models on the same virtual controller

Michael McNamara — Tue, 02 Nov 2021 23:32:49 +0000

In the past if you wanted to mix an Aruba IAP-100 series and an Aruba IAP-200 series in the same network and virtual controller you had to make sure that both APs were running the same software/firmware revision prior to trying to pair them together. If you didn’t you’d end up with one AP becoming the virtual controller and the other one would just continually reboot trying to join the virtual controller because it was unable to upgrade itself as the software image between classes/models is different.

I recently discovered that this is no longer an issue… APs that are not managed by Airwave (AMP) will reach out to the Internet (Aruba Central? or Aruba Activate?) and upgrade themselves without issue to whatever version the virtual controller is running. And APs that are managed by Airwave will also upgrade themselves so long as the upgrade image is downloaded and installed into AMP for the APs to retrieve.

This is a really nice feature, and helps simplify break-fix issues when older APs die and need to be replaced but you don’t have any IAP-135s available. Now you can use IAP-215s or any 200 series APs and whether or not you have Airwave your AP will be upgraded to the correct software to work properly.

You can mix and match APs based on software release…. IAP-135s and IAP-215s running 6.4.x software work well together, as will IAP-215s, IAP-315s and even IAP-515s running 8.6.x software.

Cheers!

Update: Friday November 11, 2021

The is a known issue with older software releases that will break the ability to upgrade from the cloud. The AP in question needs to be on a “newer” release in order to establish an SSL session to the cloud. Additional details can be found in Aruba Support Advisory ARUBA-SA-20191219-PLVL08 titled Aruba Instant Certificate Expiry Issue.

The post HPE/Aruba Instant Access Points – mixing models on the same virtual controller first appeared on Michael McNamara.

PA TAP 529 Investment Plan for College

Michael McNamara — Tue, 02 Nov 2021 02:25:08 +0000

While this topic is very different from the usual content I write, I feel it will have value for those young adults with children that are sure to be following a similar track in life; “How do I pay for my child’s college education?” I’m not financially savvy by any means, but here’s your call to action if you haven’t yet done anything to start saving.

I’m a Gen Xer and I would consider myself as middle income. I’m not rich or poor by any means, but I don’t want for much either. I buy a car/SUV every 10 years or so, mow my own lawn, pay my monthly mortgage and yearly taxes. I hold a full-time job with a large retailer, I run my own consulting business and I try to volunteer regularly with a number of organizations. With three daughters I wasn’t exactly sure how I was going to save for their college education. After a lot of reading and research I decided that a Pennsylvania TAP 529 plan was the best tool and provided the most benefits for me and my family being a Pennsylvania resident. The biggest benefit is that all my TAP 529 contributions are tax deductible at the state level. In 2020 I believe the max contribution per beneficiary was $14,000. So I could contribute $14,000 to each of my TAP 529 plans and have those contributions deducted from my income on my state taxes. This will generally save me a few thousand dollars in taxes, which I can then re-invest back into the TAP 529 accounts. In addition, the funds I contribute to the TAP 529 are excluded from the FASFA application for student aid.

I ended up selecting the PA 529 Investment Plan, and that’s where the money has been gowning for the past few years. There’s a lot of flexibility in how the funds can be allocated, if you are interested in taking an active part you can select from a myriad of options. Or you can set it and forget it and the plan will automatically re-allocate the funds to less riskier investments the closer your child gets to college age.

My Thoughts

It’s never too late to start saving or investing. Whether you are saving for your child’s college education or for your eventual retirement, there are plenty of ways to start saving and investing today. In 2018 I opened an account with Betterment, a robo advisor. That account has provide a rate of return around 9.7% annually, not a phenomenal number by any stretch but it’s definitely better than 0%.

What are you doing today to save for your child’s college education or your retirement?

Cheers!

The post PA TAP 529 Investment Plan for College first appeared on Michael McNamara.

Cisco Nexus 9300 SSD Firmware Issue

Michael McNamara — Sun, 31 Oct 2021 13:48:23 +0000

I recently stumbled into yet another interesting issue that turned out to be a bug in the SSD firmware of some Cisco Nexus 9000 Series switches. We had performed an upgrade in two of our Data Centers just over 3 years ago using the Cisco Nexus 9000 Series product line providing a 10/40Gbps network. Within the past week we had several of those switches crash and reboot themselves. Upon further investigation I found some switches that didn’t crash or reboot themselves were running with a read-only file system. It turned out that this was a known bug that had been identified by Cisco earlier this year.

Field Notice: FN – 72150 – Nexus 9000/3000 Will Fail With SSD Read-Only Filesystem – Power Cycle Required – BIOS/Firmware Upgrade Recommended

The issue was further compounded by some sloppy management, with several switches having unsaved configurations or having crashed and rebooted with unsaved configurations and ultimately inconsistent VPC states. In the short term I ended up deploying the SSD firmware update to all the impacted Cisco Nexus 9000 series switches in my network. I’ll look at performing the recommended software upgrades early next year.

You can setup notifications on the Cisco website to help keep you informed of field notices, software releases and security bulletins.

Anyone else run into this problem?

Cheers!

The post Cisco Nexus 9300 SSD Firmware Issue first appeared on Michael McNamara.

Making the leap to Rocky Linux 8.4

Michael McNamara — Sat, 30 Oct 2021 14:09:48 +0000

You always need to be learning in the technology field, it’s a field that is constantly evolving and to that point you need to be constantly expanding your knowledge and testing out new products, methods, solutions, etc.

I’m not a big fan of Oracle Linux for a number of reasons, which I’m not interesting in diving it here, so today I’m moving this server from CentOS 7.9 to Rocky Linux 8.4.

I’m also also taking the opportunity to downsize my server since my daughters are no longer spending hours upon hours playing Minecraft – life is slowly returning to normal, if only slowly. This will give me an opportunity to test out Rocky Linux and decide which operating system I’ll be using going forward in my personal and professional endeavors.

CentOS Linux release 7.9.2009 (Core)
MariaDB 10.5.12
nginx/1.20.1
PHP 7.4.25

Rocky Linux release 8.4 (Green Obsidian)
10.3.28-MariaDB
nginx/1.14.1
PHP 8.0.12

I’m trying to only spend a few hours doing this so I’m going to stick with the standard MariaDB and nginx packages that are available in the repos, although I’m upgrading to PHP 8.0 using the Remi repo. Upgrading to PHP 8.0 is going to cause me some headaches because I’m using some older WordPress plugins that are likely to break and I’ll need to pull them off the site.

If you want to live migrate a server, there’s lots of documentation and tools available to help you.

Have you done any work with Rocky Linux? I’d but curious to hear your take.

Cheers!

The post Making the leap to Rocky Linux 8.4 first appeared on Michael McNamara.

How to troubleshoot Faceook, Instagram, WhatsApp outages?

Michael McNamara — Mon, 04 Oct 2021 20:52:27 +0000

Things certainly went south for Facebook today in a spectacular way as Reddit and other forums lit up with posts about Facebook, Instagram and WhatsApp being down and unreachable. Someone asked me a simple question? How do you troubleshoot an outage like that? We’re obviously limited as “outsiders” but even as a regular netizen we can do a bit of investigative troubleshooting to get some idea of what’s going on at Facebook.

If you tried to visit Facebook earlier today you would have likely seen this message in your web browser.

This site can’t be reached
www.facebook.com’s server IP address count not be found.

Let’s start with the basics…. DNS resolution.

[root@woodstock ~]# dig facebook.com +short
[root@woodstock ~]#

That’s not good… we can’t get an IP address for facebook.com, let’s try www.facebook.com as well.

[root@woodstock ~]# dig www.facebook.com +short
[root@woodstock ~]#

Ok, equally bad… let’s try to find the authoritative DNS servers for the domain facebook.com. We know from experience that a.gtld-servers.net. is a top level DNS server for the .com TLD, but let’s confirm it’s still in the list of servers. (I’ll edit the output below to help save space and focus our attention)

[root@woodstock ~]# dig ns com

;; ANSWER SECTION:
com. 170780 IN NS b.gtld-servers.net.
com. 170780 IN NS i.gtld-servers.net.
com. 170780 IN NS m.gtld-servers.net.
com. 170780 IN NS j.gtld-servers.net.
com. 170780 IN NS l.gtld-servers.net.
com. 170780 IN NS e.gtld-servers.net.
com. 170780 IN NS k.gtld-servers.net.
com. 170780 IN NS h.gtld-servers.net.
com. 170780 IN NS g.gtld-servers.net.
com. 170780 IN NS d.gtld-servers.net.
com. 170780 IN NS c.gtld-servers.net.
com. 170780 IN NS a.gtld-servers.net.
com. 170780 IN NS f.gtld-servers.net.

;; ADDITIONAL SECTION:
a.gtld-servers.net. 69518 IN A 192.5.6.30
b.gtld-servers.net. 82780 IN A 192.33.14.30
c.gtld-servers.net. 84678 IN A 192.26.92.30
d.gtld-servers.net. 84679 IN A 192.31.80.30
e.gtld-servers.net. 84678 IN A 192.12.94.30
f.gtld-servers.net. 84138 IN A 192.35.51.30
g.gtld-servers.net. 84679 IN A 192.42.93.30
h.gtld-servers.net. 84678 IN A 192.54.112.30
i.gtld-servers.net. 84679 IN A 192.43.172.30
j.gtld-servers.net. 82780 IN A 192.48.79.30
k.gtld-servers.net. 84679 IN A 192.52.178.30
l.gtld-servers.net. 84138 IN A 192.41.162.30
m.gtld-servers.net. 84679 IN A 192.55.83.30
a.gtld-servers.net. 81113 IN AAAA 2001:503:a83e::2:30

Ok, so a.gtld-servers.net is still in there… so let’s ask that DNS server who are the DNS servers for the domain facebook.com.

[root@woodstock ~]# dig @a.gtld-servers.net. ns facebook.com

;; QUESTION SECTION:
;facebook.com. IN NS

;; AUTHORITY SECTION:
facebook.com. 172800 IN NS a.ns.facebook.com.
facebook.com. 172800 IN NS b.ns.facebook.com.
facebook.com. 172800 IN NS c.ns.facebook.com.
facebook.com. 172800 IN NS d.ns.facebook.com.

;; ADDITIONAL SECTION:
a.ns.facebook.com. 172800 IN A 129.134.30.12
a.ns.facebook.com. 172800 IN AAAA 2a03:2880:f0fc:c:face:b00c:0:35
b.ns.facebook.com. 172800 IN A 129.134.31.12
b.ns.facebook.com. 172800 IN AAAA 2a03:2880:f0fd:c:face:b00c:0:35
c.ns.facebook.com. 172800 IN A 185.89.218.12
c.ns.facebook.com. 172800 IN AAAA 2a03:2880:f1fc:c:face:b00c:0:35
d.ns.facebook.com. 172800 IN A 185.89.219.12
d.ns.facebook.com. 172800 IN AAAA 2a03:2880:f1fd:c:face:b00c:0:35

There are the DNS servers for the domain facebook.com, so let’s see if we can communicate with any of them.

Let’s start by pinging the servers (for brevity I’m only going to go through the first server above… but they all were having issues today)

[root@woodstock ~]# ping a.ns.facebook.com -c 5 -q
PING a.ns.facebook.com (129.134.30.12) 56(84) bytes of data.

--- a.ns.facebook.com ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 3999ms

That’s not completely unexpected as most networks today block ICMP traffic by default to prevent DoS attacks so let’s try a simple DNS query to that server.

[root@woodstock ~]# dig @a.ns.facebook.com ns facebook.com

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.5 <<>> @a.ns.facebook.com ns facebook.com
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

That’s definitely not good, so we can assume at this point that we’re unable to communicate with the DNS servers for the facebook.com domain name, hence the error message we’re gettting in the web browser. But let’s dig a little deeper to see if the IP networks that are associated with those DNS servers are “online” and reachable. We can do that by looking at a BGP looking glass or full BGP routing table and see if that prefix is being advertised, we can also try to traceroute to the IP address in question and see if we can reach the Facebook network.

Let’s use WHOIS to see what network that IP address is a member of (again I’ve cut out some of the output below).

[root@woodstock ~]# whois 129.134.30.12
[Querying whois.arin.net]
[whois.arin.net]

NetRange: 129.134.0.0 - 129.134.255.255
CIDR: 129.134.0.0/16
NetName: THEFA-3
NetHandle: NET-129-134-0-0-1
Parent: NET129 (NET-129-0-0-0-0)
NetType: Direct Assignment
OriginAS:
Organization: Facebook, Inc. (THEFA-3)
RegDate: 2015-05-13
Updated: 2015-05-13
Ref: https://rdap.arin.net/registry/ip/129.134.0.0

Ok, so the original netblock assigned to Facebook from ARIN was 129.134.0.0/16 but Facebook could have subnetted that so we need to mindful that it could be smaller than the /16 we see allocated above.

There was a mention in some of the forums that all BGP peers to Facebook were down, so let’s check there. Let’s look at the Hurricane Electric’s Network Looking Glass using the IP address of 129.134.30.12. That shows us the following (as of 5:00PM EDT Monday October 4, 2021).

core1.mnz1.he.net> show ip bgp routes detail 129.134.30.12
Number of BGP Routes matching display condition : 2
S:SUPPRESSED F:FILTERED s:STALE x:BEST-EXTERNAL
1 Prefix: 129.134.0.0/17, Rx path-id:0x00000000, Tx path-id:0x00000001, rank:0x00000001, Status: BI, Age: 28d7h21m27s
NEXT_HOP: 65.49.109.182, Metric: 1486, Learned from Peer: 216.218.252.172 (6939)
LOCAL_PREF: 100, MED: 0, ORIGIN: igp, Weight: 0, GROUP_BEST: 1
AS_PATH: 3491 32934
COMMUNITIES: 6939:1111 6939:7039 6939:8392 6939:9003
2 Prefix: 129.134.0.0/17, Rx path-id:0x00000000, Tx path-id:0x00040001, rank:0x00000002, Status: Ex, Age: 86d22h8m40s
NEXT_HOP: 62.115.42.144, Metric: 0, Learned from Peer: 62.115.42.144 (1299)
LOCAL_PREF: 70, MED: 48, ORIGIN: igp, Weight: 0, GROUP_BEST: 1
AS_PATH: 1299 32934
COMMUNITIES: 6939:2000 6939:7297 6939:8840 6939:9001
Last update to IP routing table: 2d3h2m25s

Entry cached for another 60 seconds.

So it would appear that the routes are in the Internet BGP tables for that first server… I’m going to guess that Facebook is in recovery mode and slowly restoring their network – assuming it’s not a DoS attack or something similar.

Let’s try a traceroute using ICMP packets, again we need to be mindful that some organizations will block all ICMP traffic to protect themselves against the miscredants and to better conceal their network topology.

[root@woodstock~]# traceroute -I 129.134.30.12
traceroute to 129.134.30.12 (129.134.30.12), 30 hops max, 60 byte packets
1 107.170.19.254 (107.170.19.254) 4.061 ms 4.040 ms 4.037 ms
2 138.197.248.154 (138.197.248.154) 1.545 ms 1.558 ms 1.558 ms
3 157.240.71.232 (157.240.71.232) 41.384 ms 41.345 ms 41.380 ms
4 157.240.42.70 (157.240.42.70) 1.893 ms 1.911 ms 1.913 ms
5 157.240.40.230 (157.240.40.230) 3.552 ms 3.529 ms 3.538 ms
6 129.134.47.188 (129.134.47.188) 8.797 ms 7.276 ms 7.229 ms
7 * * *
8 * * *
9 * * *
10 * * *
11 * * *
12 * * *

Ok, so we’re definitely reaching parts of the Facebook network, as 129.134.47.188 is on the same advertised network as a.ns.facebook.com (129.134.30.12).

Unfortunately that’s about as far as we can take it from here, we’ll need to wait for the news from Facebook itself.

Cheers!

The post How to troubleshoot Faceook, Instagram, WhatsApp outages? first appeared on Michael McNamara.

How does latency impact network throughput?

Michael McNamara — Tue, 28 Sep 2021 16:49:15 +0000

I was recently having a conversation with a DevOps colleague (let’s not jeer too loudly) who was trying to understand why he wasn’t getting more than 350Mbps between two servers over a 1Gbps WAN connection. He thought there must be a problem with the network and suggested that I should open a ticket with the carrier to “fix” the issue. I attempted to explain to him that it was the latency and distance between the two servers (3,000 miles) that was limiting the TCP performance and he could potentially overcome that issue by using multiple TCP sockets with larger TCP window sizes, or potentially switch to UDP instead of TCP.

I used iPerf3 to demonstrate the issue… with a single stream/thread we were able to achieve ~ 350Mbps. With a second stream/thread we were able to hit ~ 600Mbps. With a third stream/thread we were able to hit ~ 789Mbps.

It wasn’t magic…. it’s the well known fact that latency plays a huge role in TCP performance. In order to understand why it impacts TCP performance you need to understand how TCP works. TCP requires that transmitted data sets are acknowledged before the next set of data can be transmitted. The TCP window size determines the size of those data sets, larger TCP window size allows more data to be transmitted before an acknowledgement is required. The delay in getting the acknowledgement back is what limits the performance.

There is a well written blog article from Netbeez written by Stefano Gridelli titled, Impact of Packet Loss and Round-Trip Time on Throughput that covers this topic in great detail. You can even apply a mathematical formula to determine the max potential throughput given a known RTT latency.

Cheers!

The post How does latency impact network throughput? first appeared on Michael McNamara.

Lenovo ThinkPad T14 with Realtek 8852AE Wireless Issues

Michael McNamara — Sun, 22 Aug 2021 14:16:16 +0000

I’m still alive, just super busy these days… here’s a quick one for anyone using the Lenovo ThinkPad T14 (the issue also impacts a bunch of other models).

It turns out there are multiple models of the Lenovo ThinkPad T14, one with an Intel wireless NIC and one with a Realtek wireless NIC. We quickly discovered that the model with a Realtek RTL8852AE WiFi 6 802.11ax PCIe adapter was having a lot of issues staying connected to a number of different Cisco Wireless LAN Controllers in different physical locations. The symptom displayed to the user as an inability to pull a DHCP address, even though the device showed it was connected to the SSID. In the end it turns out that a driver released on August 10, 2021 (6001.0.10.334) that apparently fixes an issue when clients are using a Cisco wireless infrastructure. Unfortunately there’s no mention of what exactly the issue was in the release notes.

You can find the updated driver and release notes at the following link;

https://pcsupport.lenovo.com/us/en/products/laptops-and-netbooks/thinkpad-t-series-laptops/thinkpad-t14s-type-20uh-20uj/downloads/driver-list/component?name=Networking%3A%20Wireless%20LAN

I’ve been seeing a lot of issues as we move to WiFi 6 access points – currently rolling out Juniper MIST AP43s. And in the vast majority of these cases older drivers are the problem. A quick upgrade to the latest and greatest driver is solving the majority of issues. So if you are having issues with the WiFi 6 based access point or client, I would strongly suggest you update your driver before you fire up WireShark.

Cheers!

The post Lenovo ThinkPad T14 with Realtek 8852AE Wireless Issues first appeared on Michael McNamara.