We just completed an upgrade of our core Cisco Nexus 7010s from 4.2(6) to 5.2(4). We followed the process laid out by the upgrade guide and performed an ISSU (In-Service-Software-Upgrade) since we have dual supervisors in both switches. While there was no service interruption during the upgrade the process did take about 45 minutes per switch (we had 7 cards in each chassis) so be sure to plan your maintenance window accordingly.
I did notice a very odd problem while trying to copy the software to the switch via TFTP – it was insanely slow. It was copying the software at what appeared to be 8-16kbps. I issued a Ctrl-C and tried an FTP download and it flew along and I was done in minutes. In both cases I utilized the default VRF. I’m curious to understand why the TFTP was so slow compared to the FTP. We utilize TFTP pretty heavily in our environment and we’ve never had a problem with any other equipment so I suspect the Cisco Nexus 7010s and not the CentOS Linux server that acts as our central TFTP server.
The next big hurdle will be finding the downtime to apply all the EPLD/FPGA firmware upgrades for each card. I understand the EPLD upgrade is disruptive but I’m trying to determine how big a maintenance window I need in order to safely accomplish the task on both core Cisco 7010s – doing them one at a time. One Cisco resource I talked with said I would need a minimum 4 hour maintenance window per chassis – there’s no way in hell I’m going to get a four hour maintenance window in a healthcare environment.
Here are the commands and output in case anyone is curious or would like to compare notes.
sw-n7010-ccr.acme.org# install all kickstart bootflash:n7000-s1-kickstart. 5.2.4.bin system bootflash:n7000-s1-dk9.5.2.4.bin Verifying image bootflash:/n7000-s1-kickstart.5.2.4.bin for boot variable "kickstart". [####################] 100% -- SUCCESS Verifying image bootflash:/n7000-s1-dk9.5.2.4.bin for boot variable "system". [####################] 100% -- SUCCESS Verifying image type. [####################] 100% -- SUCCESS Extracting "lc1n7k" version from image bootflash:/n7000-s1-dk9.5.2.4.bin. [####################] 100% -- SUCCESS Extracting "bios" version from image bootflash:/n7000-s1-dk9.5.2.4.bin. [####################] 100% -- SUCCESS Extracting "lc1n7k" version from image bootflash:/n7000-s1-dk9.5.2.4.bin. [####################] 100% -- SUCCESS Extracting "lc1n7k" version from image bootflash:/n7000-s1-dk9.5.2.4.bin. [####################] 100% -- SUCCESS Extracting "lc1n7k" version from image bootflash:/n7000-s1-dk9.5.2.4.bin. [####################] 100% -- SUCCESS Extracting "system" version from image bootflash:/n7000-s1-dk9.5.2.4.bin. [####################] 100% -- SUCCESS Extracting "kickstart" version from image bootflash:/n7000-s1-kickstart.5.2.4.bin. [####################] 100% -- SUCCESS Extracting "lc1n7k" version from image bootflash:/n7000-s1-dk9.5.2.4.bin. [####################] 100% -- SUCCESS Extracting "cmp" version from image bootflash:/n7000-s1-dk9.5.2.4.bin. [####################] 100% -- SUCCESS Extracting "cmp-bios" version from image bootflash:/n7000-s1-dk9.5.2.4.bin. [####################] 100% -- SUCCESS Performing module support checks. [####################] 100% -- SUCCESS Notifying services about system upgrade. [####################] 100% -- SUCCESS Compatibility check is done: Module bootable Impact Install-type Reason ------ -------- -------------- ------------ ------ 1 yes non-disruptive rolling 2 yes non-disruptive rolling 3 yes non-disruptive rolling 4 yes non-disruptive rolling 5 yes non-disruptive reset 6 yes non-disruptive reset 7 yes non-disruptive rolling Images will be upgraded according to following table: Module Image Running-Version(pri:alt) New-Version Upg-Required ------ ---------- ---------------------------------------- -------------------- ------------ 1 lc1n7k 4.2(6) 5.2(4) yes 1 bios v1.10.6(11/04/08): v1.10.6(11/04/08) no 2 lc1n7k 4.2(6) 5.2(4) yes 2 bios v1.10.6(11/04/08): v1.10.6(11/04/08) no 3 lc1n7k 4.2(6) 5.2(4) yes 3 bios v1.10.6(11/04/08): v1.10.6(11/04/08) no 4 lc1n7k 4.2(6) 5.2(4) yes 4 bios v1.10.6(11/04/08): v1.10.6(11/04/08) no 5 system 4.2(6) 5.2(4) yes 5 kickstart 4.2(6) 5.2(4) yes 5 bios v3.19.0(03/31/09): v3.19.0(03/31/09) no 5 cmp 4.2(1) 5.2(4) yes 5 cmp-bios 02.01.05 02.01.05 no 6 system 4.2(6) 5.2(4) yes 6 kickstart 4.2(6) 5.2(4) yes 6 bios v3.19.0(03/31/09): v3.19.0(03/31/09) no 6 cmp 4.2(1) 5.2(4) yes 6 cmp-bios 02.01.05 02.01.05 no 7 lc1n7k 4.2(6) 5.2(4) yes 7 bios v1.10.6(11/04/08): v1.10.6(11/04/08) no Additional info for this installation: -------------------------------------- Do you want to continue with the installation (y/n)? [n] y Install is in progress, please wait. Syncing image bootflash:/n7000-s1-kickstart.5.2.4.bin to standby. [####################] 100% -- SUCCESS Syncing image bootflash:/n7000-s1-dk9.5.2.4.bin to standby. [####################] 100% -- SUCCESS Setting boot variables. [####################] 100% -- SUCCESS Performing configuration copy. [####################] 100% -- SUCCESS Module 1: Refreshing compact flash and upgrading bios/loader/bootrom. Warning: please do not remove or power off the module at this time. [####################] 100% -- SUCCESS Module 2: Refreshing compact flash and upgrading bios/loader/bootrom. Warning: please do not remove or power off the module at this time. [####################] 100% -- SUCCESS Module 3: Refreshing compact flash and upgrading bios/loader/bootrom. Warning: please do not remove or power off the module at this time. [####################] 100% -- SUCCESS Module 4: Refreshing compact flash and upgrading bios/loader/bootrom. Warning: please do not remove or power off the module at this time. [####################] 100% -- SUCCESS Module 5: Refreshing compact flash and upgrading bios/loader/bootrom. Warning: please do not remove or power off the module at this time. [####################] 100% -- SUCCESS Module 6: Refreshing compact flash and upgrading bios/loader/bootrom. Warning: please do not remove or power off the module at this time. [####################] 100% -- SUCCESS Module 7: Refreshing compact flash and upgrading bios/loader/bootrom. Warning: please do not remove or power off the module at this time. [####################] 100% -- SUCCESS 2012 Mar 21 04:52:52 sw-n7010-ccr %$ VDC-1 %$ %PLATFORM-2-MOD_REMOVE: Module 5 removed (Serial number JAXXXXXXXXX) 2012 Mar 21 04:58:30 sw-n7010-ccr %$ VDC-1 %$ %CMPPROXY-STANDBY-2-LOG_CMP_UP: Connectivity Management processor(on module 5) is now UP Module 5: Waiting for module online. -- SUCCESS 2012 Mar 21 04:59:42 sw-n7010-ccr %$ VDC-1 %$ %IDEHSD-STANDBY-2-MOUNT: logflash: online Notifying services about the switchover. [####################] 100% -- SUCCESS "Switching over onto standby". writing reset reason 7, SAP(93): Swover due to install NX7 SUP Ver 3.19.0 Serial Port Parameters from CMOS PMCON_1: 0x200 PMCON_2: 0x0 PMCON_3: 0x3a PM1_STS: 0x101 Performing Memory Detection and Testing Testing 1 DRAM Patterns Total mem found : 4096 MB Memory test complete. NumCpus = 2. Status 61: PCI DEVICES Enumeration Started Status 62: PCI DEVICES Enumeration Ended Status 9F: Dispatching Drivers Status 9E: IOFPGA Found Status 9A: Booting From Primary ROM Status 98: Found Cisco IDE Status 98: Found Cisco IDE Status 90: Loading Boot Loader Reset Reason Registers: 0x1 0x0 Filesystem type is ext2fs, partition type 0x83 GNU GRUB version 0.97 Autobooting bootflash:/n7000-s1-kickstart.5.2.4.bin bootflash:/n7000-s1-dk9.5.2.4.bin... Filesystem type is ext2fs, partition type 0x83 Booting kickstart image: bootflash:/n7000-s1-kickstart.5.2.4.bin....................... ............................................................................Image verification OK ÿ INIT: version 2 Checking all filesystems..r.r.r.. done. Loading system software /bootflash//n7000-s1-dk9.5.2.4.bin read done Uncompressing system image: bootflash:/n7000-s1-dk9.5.2.4.bin Wed Mar 21 05:03:26 EDT 2012 blogger: nothing to do. ..done Wed Mar 21 05:03:30 EDT 2012 Load plugins that defined in image conf: /isan/plugin_img/img.conf Loading plugin 0: core_plugin... num srgs 1 0: swid-core-supdc3, swid-core-supdc3 num srgs 1 0: swid-supdc3-ks, swid-supdc3-ks INIT: Entering runlevel: 3 User Access Verification SW-N7010-CCR(standby) login:
Now we moved over to the standby supervisor which was slot 5 at the time to obverse the upgrade complete;
Continuing with installation, please wait 2012 Mar 21 04:58:30 sw-n7010-ccr %$ VDC-1 %$ %CMPPROXY-2-LOG_CMP_UP: Connectivity Management processor(on module 5) is now UP Module 5: Waiting for module online. -- SUCCESS 2012 Mar 21 04:59:42 sw-n7010-ccr %$ VDC-1 %$ %IDEHSD-2-MOUNT: logflash: online 2012 Mar 21 04:59:47 sw-n7010-ccr %$ VDC-1 %$ Mar 21 04:59:47 %KERN-2-SYSTEM_MSG: Switchover started by redundancy driver - kernel 2012 Mar 21 04:59:47 sw-n7010-ccr %$ VDC-1 %$ %SYSMGR-2-HASWITCHOVER_PRE_START: This supervisor is becoming active (pre-start phase). 2012 Mar 21 04:59:47 sw-n7010-ccr %$ VDC-1 %$ %SYSMGR-2-HASWITCHOVER_START: Supervisor 5 is becoming active. 2012 Mar 21 04:59:49 sw-n7010-ccr %$ VDC-1 %$ %SYSMGR-2-SWITCHOVER_OVER: Switchover completed. 2012 Mar 21 05:00:00 sw-n7010-ccr %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Load time of /isan/etc/routing-sw/cli/metro.cli_: 695539ms - ascii_cfg_server 2012 Mar 21 05:04:52 sw-n7010-ccr %$ VDC-1 %$ %CMPPROXY-STANDBY-2-LOG_CMP_UP: Connectivity Management processor(on module 6) is now UP 2012 Mar 21 05:06:00 sw-n7010-ccr-b %$ VDC-1 %$ %IDEHSD-STANDBY-2-MOUNT: logflash: online Module 1: Non-disruptive upgrading. [####################] 100% -- SUCCESS Module 2: Non-disruptive upgrading. [####################] 100% -- SUCCESS Module 3: Non-disruptive upgrading. [####################] 100% -- SUCCESS Module 4: Non-disruptive upgrading. [####################] 100% -- SUCCESS Module 7: Non-disruptive upgrading. [####################] 100% -- SUCCESS Module 5: Upgrading CMP image. Warning: please do not reload or power cycle CMP module at this time. [####################] 100% -- SUCCESS Module 6: Upgrading CMP image. Warning: please do not reload or power cycle CMP module at this time. [####################] 100% -- SUCCESS Recommended action:: "Please reload CMP(s) manually to have it run in the newer version.". Install has been successful.
With the upgrade complete the only thing we needed to-do was to restart the CMPs on slots 5 and 6;
Cisco Nexus Operating System (NX-OS) Software TAC support: http://www.cisco.com/tac Copyright (c) 2002-2012, Cisco Systems, Inc. All rights reserved. The copyrights to certain works contained in this software are owned by other third parties and used and distributed under license. Certain components of this software are licensed under the GNU General Public License (GPL) version 2.0 or the GNU Lesser General Public License (LGPL) Version 2.1. A copy of each such license is available at http://www.opensource.org/licenses/gpl-2.0.php and http://www.opensource.org/licenses/lgpl-2.1.php sw-n7010-ccr.acme.org# reload cmp module 5 This command will reload the CMP on the supervisor in slot 5. Continue (y/n)? [no] y sw-n7010-ccr.acme.org# 2012 Mar 21 05:23:41 sw-n7010-ccr-b %$ VDC-1 %$ %CMPPROXY-2-LOG_CMP_WENT_DOWN: Connectivity Management processor (on module 5) went DOWN 2012 Mar 21 05:24:50 sw-n7010-ccr %$ VDC-1 %$ %CMPPROXY-2-LOG_CMP_UP: Connectivity Management processor(on module 5) is now UP sw-n7010-ccr.acme.org# reload cmp module 6 This command will reload the CMP on the supervisor in slot 6. Continue (y/n)? [no] y sw-n7010-ccr.acme.org# 2012 Mar 21 05:25:50 sw-n7010-ccr-b %$ VDC-1 %$ %CMPPROXY-STANDBY-2-LOG_CMP_WENT_DOWN: Connectivity Management processor (on module 6) went DOWN 2012 Mar 21 05:26:57 sw-n7010-ccr %$ VDC-1 %$ %CMPPROXY-STANDBY-2-LOG_CMP_UP: Connectivity Management processor(on module 6) is now UP
Cheers!
Gabe says
Being in a health care environment myself, I know how hard it is to obtain downtimes. I’m actually going in this evening to upgrade a couple of Nortel 8600s from 4.1.7.2 to 5.1.5.1. It’s rare that we get to perform an upgrade that’s not overnight on a Sunday when a clinical system is having their downtime… How does your team/organization handle downtimes?
I always known TFTP to transfer very slowly. My guess is that it’s a limitation of the switch and not the TFTP server. I’ve especially noticed It takes forever to backup the Binary and ASCII configuration files from the Avaya/Nortel ERS 5000 series switches. I usually run the job with Enterprise Switch Manager and walk away for about an hour.
Thanks for sharing your notes.
Michael McNamara says
Hi Gabe,
It might be a little late now but you might want to consider 5.1.7.0 in place of 5.1.5.1. We had a few LANE lockup issues with our 8630GBR cards that while not completely resolved in 5.1.7.0 are supposed to be less likely.
The most we ever need is a few minute outage over a set maintenance window. We term it to the users as a 5 minute roving outage between 4AM and 5AM. It’s quite a job to upgrade 47 switches/stacks over a 60 minute window but we have it down to a science thanks to some Perl scripts. We usually perform all our maintenance activities on a Wednesday morning (helps with vendor support and doesn’t ruin everyone’s weekend).
If you’ve ever looked at those backup files, they are pretty sizable. It takes us about 4 hours to backup all 533 switches/stacks in our network.
Cheers!
Gabe says
It’s a bit too late, but we can consider it for future 8600s! The switches we upgraded do not have the 8630GBRs. They are still running older E and pre-E modules. :-X
Going from 4.1.7.2 to 5.1.5.1 caused about a 15 minute impact to the users.
The biggest problem we have right now is that in one building (4.1.7.2), we have issues with the ARP tables not replicating and requires both 8600s to be rebooted (seems like quarterly). Plus the closet has experienced everything from flooding and numerous hardware failures within the past few years. Closet environment is otherwise better than most of our other closets! Rumor has it that the floor is haunted.
Michael McNamara says
I’ve had a few FDB/MAC and ARP issues even with 4.1.8.3 software.
I have not yet seen any of those issues with 5.1.5.1 or 5.1.7.0 software.
Cheers!
Nate says
Do you have the default Control Plane Policing turned on?
Transfers over the default interface are going to be brutal without it.
I tweaked my CoPP to allow faster transfers inband- but doing transfers over the Management VRF neatly sidesteps the issue.
EPLD take a long time. I’m watching a switch upgrade itself right now. It’s only got 3 linecards (+ sups + fabrics) and it’s been 1.5 hours and it’s just finished the linecards.
Michael McNamara says
Hi Nat,
I’m not using the management VRFs on my Cisco Nexus 7010s. I’m doing all the mangement through the inband IP interfaces (or default VRF). I’ll have a look at the configuration… totally forgot about all that extra policy on the Cisco Nexus.
Thanks for the feedback regarding the EPLD. I’ve got 5 cards and 2 sups so I’m guessing it’s going to take a long long time. I’m not looking forward to that.
I’ll be doing an upgrade to 5.1.3 on 2 pairs of Cisco Nexus 5010s next week. It’s been difficult to get answers outside of the information from the release notes. Unfortunately we’re still running 4.x so there won’t be any ISSU on that morning.
Thanks for the tip!
Nate says
Forgot to mention- there’s a “parallel” command which will let you upgrade multiple linecards at once. That could shorten your window dramatically, provided you’re not going to get into any issues with losing everything simultaneously, for various periods of time.
The way I laid out my switches, losing one linecard at a time guarantees I’m not going to get into any black holes if I lose just one switch at a time.
If I lose both my 10Gb cards during the upgrade process for 40 minutes, but my 1gb card is only down for 30 minutes, I could end up with a bad scenario where this switch is advertising a summary route for networks it isn’t really connected to any more. If only one of my 10Gb links is down, then I’ll still have a cross/uplink with which to peer.
So I’m sticking with parallel. The other option is to shut down all the interfaces prior to doing a parallel upgrade, and then turn them all back on again after- but in my environment, the lengthy epld process is more of an annoyance here, 2 hours later.
Michael McNamara says
Thanks again Nate!
Yedi says
Hi Michael,
I recently tried to upgrade my Nexus 7010 from 6.1.1 to 6.1.2 and i failed. I only have single supervisor (SUP-1) and i did the ISSU method with “install all” command. I got this kind of message:
“unable to install log files”
“errno=13”
I tried it once again and still got the same message. After these two failures, I tried to reload my system and after entering “reload” command, i tried to break the normal boot sequence by entering control-c. This is where the big problem occurs. My system can’t go to loader> prompt and stuck in “Status 90: Loading Boot Loader”. Before this, I have done this “breaking the boot sequence” thing 2 times and it always went smooth.
After searching many documentations, I suspect that the problem is in my BIOS that got corrupted and i suspect it’s because the failure in upgrade process. What I want to ask you are:
1. From what I know, upgrading NX-OS also means upgrading the BIOS. But since my upgrade process failed, I assume that my BIOS didn’t changed. Am i right?
2. What is the reason behind the failure of my upgrade based on that errno=13 message?
3. Have you encountered this “status 90” problem before?
I hope you can help me in this matter. Thanks in advance.
Best regards,
Yedi
Michael McNamara says
I would strongly suggest you contact Cisco TAC for support. Unfortunately I haven’t performed that many upgrades on the Cisco Nexus 7010 so I can’t really offer any advice. I’ve never personally encountered the errors you are reporting.
Sorry.
Yedi says
Hi Michael,
It’s ok, I will contact the Cisco TAC soon. Thanks for your response
best regards,
Yedi