Posts tagged VMWARE
Cisco Nexus 1000V Upgrade to 4.2(1)SV1(4)
0I just recently completed an upgrade of our Cisco Nexus 1000V from 4.0(4)SV1(3) to 4.2(1)SV1(4). Prior to proceeding with the Cisco Nexus 1000V upgrade we had to first upgrade vCenter to 4.1, Update Manager to 4.1 Update 1 and lastly the VMware ESX hosts themselves to 4.1. With all that complete we set out to upgrade the Cisco Nexus 1000V but quickly ran into a few problems.
Pre-Upgrade Script
The Pre-Upgrade Script, a TCL script which checks your Cisco Nexus 1000V configuration for any potential issues, immediately flag our Port Channel configurations in Check 2.
############################################################################### ## COMPANY NAME: Cisco Systems Inc ## ## Copyright © 2011 Cisco Systems, Inc. All rights Reserved. ## ## ## ## SCRIPT NAME: pre-upgrade-check-4.2(1)SV1(4).tcl ## ## VERSION: 1.0 ## ## DESCRIPTION: This script is applicable to all releases prior to ## ## 4.2(1)SV1(4). ## ## ## ... ... ... ========= CHECK 2: ========= Checking for Interface override configurations on Port-channnel Interface ... ############################################################################### ## FAIL ## ## ## ## List of Port-Channel Interface(s) with interface level configuration(s) ## 1: port-channel1 has below overrides. mtu 9000 2: port-channel2 has below overrides. mtu 9000 3: port-channel3 has below overrides. mtu 9000 4: port-channel4 has below overrides. mtu 9000 ... ... ...
While originally testing the Cisco Nexus 1000V prior to going into production some 463 days earlier we had played around with enabling Jumbo Frame support within the VM environment. We had set the MTU on the individual port-channel configurations to 9000. Now the pre-upgrade script was telling us that we needed to clean this up and remove any specific configuration from the port-channel and instead apply it to the port-profile configuration. I added the system mtu 9000 command to the port-profile but got a few surprises when I tried to remove the MTU command. The first surprise when I issued the “inter po1, no mtu 9000″ was loosing connectivity to the VM guests on that host. I had to manually restart the VEM on the ESX host with the “vem restart” command from an CLI prompt. The second surprise was that “mtu 9000″ in the configuration had been replaced by “mtu 1500″. That wasn’t going to work so I had to reach out to Cisco TAC who immediately recognized the issue and provided a workaround;
On the ESX host stop the VEM
vem stop
Then on the Cisco Nexus 1000V delete the port-channels and associated VEM (I’ll use the first server as an example assuming there are two VSMs installed)
config no inter po1 no inter po2 no vem 3
On the ESX host start the VEM
vem start
And sure enough it worked just as promised… recreating the port-channels without the MTU commands. Obviously we had to put each ESX host into maintenance mode before we could just stop and start the VEM.
With that taken care of we started upgrading the VEMs using Update Manager. Unfortunately Update Manager only made it through about 6 ESX hosts before it got stuck. We had to manually install the update VIB on the remaining 12 ESX hosts ourselves. We utilized FileZilla to copy the VIB up to each server and the utilized PuTTY to SSH into each server and manually update the VEM;
[root@esx-srv1-dc1-pa ~]# esxupdate -b ./cross_cisco-vem-v130-4.2.1.1.4.0.0-2.0.1.vib update Unpacking cross_cisco-vem-v130-esx_4.2.1.1.. ############################################################# [100%] Installing cisco-vem-v130-esx ############################################################# [100%] Running [rpm -e cisco-vem-v120-esx]... ok. Running [/usr/sbin/vmkmod-install.sh]... ok.
With all the ESX hosts upgrade we had to physically reboot the vCenter server to get the currently running VUM task to fail so we could complete the upgrade from the Cisco Nexus 1000V.
Next we launched the upgrade application and before long we had the standby VSM upgraded to 4.2(1)SV1(4). Here’s where we ran into another small scare. After the upgrade of the standby VSM the upgraded VEMs are supposed to re-register to the newly upgraded VSM. We waited about 5 minutes an none of the VEMs had discnonected from the primary VSM running 4.0(4)SV1(3) to the standby VSM that was now running 4.2(1)SV1(4). It was only approximately 15-20 minutes later (while searching Google for some hint) that the VEMs just up and started to connect to the newly upgraded standby VSM.
Cheers!
New Data Center – Where have I been?
8I thought I would post a few quick words on where I’ve been for the past 2 months (certainly not writing quality content for this blog). The past 60 days have been very hectic as I’ve started down the final stretch of designing, building and lighting a new data center. Thankfully the team and I are no strangers to moving computer rooms or constructing new buildings so we’re keenly aware of all the technical details needed to be successful in such a large endeavor.
I have so many short stories to share but no time to share them… In any event I’m now getting up to speed with a lot of new equipment, specifically Cisco’s Nexus gear.
What equipment did we use?
- Cisco Nexus 7010
- Cisco Nexus 5010
- Cisco Nexus 2148
- Cisco Catalyst 3750E
- Cisco Catalyst 2960G
- Cisco ASA5520
- Cisco ACE 4710
- Cisco AS5300 (yes we still have some dial-up users/vendors)
- Cisco 7301 Router
- Cisco 2821 Router
What racks did we use for the network equipment?
- Liebert Knurr Racks
- Liebert MPH/MPX PDUs
What equipment did we use for the servers/blades?
- HP Rack 10000 G2
- HP Rack PDU (AF503A)
- HP IP KVM Console (AF601A)
- HP BladeSystem c7000 Enclosure
- HP Virtual Connect Flex-10 Interconnect
- HP SAN 8Gb Interconnect
- HP BL460c G6
- HP BL490c G6
- HP DL380 G6
- HP DL360 G6
What are we using for storage?
- IBM XIV System Storage (SAN) (w/4 1Gbps iSCSI replication ports)
- IBM SAN80B-4 SAN Switch
Additional miscellaneous equipment;
- MRV LX-4048T (terminal server)
- Brother P-Touch PT-2100 / Brady ID PRO Plus label makers
As some of you might know we selected Cisco as the network electronics vendor and have implemented their Cisco Nexus 7010 switches as our core, followed by the Nexus 5010 switches as distribution to the Nexus 2148 (FEX) switches in a top of rack configuration. We also utilized Catalyst 2960G switches for our management/out-of-band network given that the Nexus 2148 only supports 1000BaseT, no 10Mbps or 100Mbps connectivity. Of course Cisco is in the process of releasing the Nexus 2248 which supports 100/1000Mbps connectivity to edge devices. We chose to utilize the HP Virtual Connect Flex-10 in our VM enclosures and we’ll utilize the Cisco 3120X in our non-VM enclosures. We’ve also installed and configured the Nexus 1000V in coordination with our VMware vSphere 4 environment. We decided that the CEE/DCE/FCoE revolution wasn’t quite here yet, or perhaps we weren’t quite ready for it so we stayed with a traditional Fiber Channel infrastructure around two IBM (oem Brocade) 80 port 8Gbps SAN switches. For SAN replication we’ll be using multiple 1Gbps iSCSI ports over a 10GE WAN. We ended up with an IBM XIV so we’ll have to see if it can keep up with all the traffic that’s going to be thrown it’s way.
So there should certainly be no shortage of material to talk about with all this new equipment, however, I’m certainly going to be very busy the next six months.
Here are some pictures of the cage (under 800 sq ft) , if interested. You’ll notice the chair and the upgrade that we performed on it in the last two pictures.
Cheers!
vSphere SCSI reservation conflict
12Sep 1 13:04:35 mdcc01h10r242 vmkernel: 0:00:26:02.384 cpu4:4119)WARNING: ScsiDeviceIO: 1374: I/O failed due to too many reservation conflicts. naa.600508b4000547cc0000b00001540000 (920 0 3) Sep 1 13:04:40 mdcc01h10r242 vmkernel: 0:00:26:07.400 cpu6:4119)WARNING: ScsiDeviceIO: 1374: I/O failed due to too many reservation conflicts. naa.600508b4000547cc0000b00001540000 (920 0 3) Sep 1 13:04:40 mdcc01h10r242 vmkernel: 0:00:26:07.400 cpu6:4119)WARNING: Partition: 705: Partition table read from device naa.600508b4000547cc0000b00001540000 failed: SCSI reservation conflict
A quick examination of the other ESX hosts revealed the following;
Sep 1 13:04:26 mdcc01h09r242 vmkernel: 21:22:13:25.727 cpu10:4124)WARNING: FS3: 6509: Reservation error: SCSI reservation conflict Sep 1 13:04:31 mdcc01h09r242 vmkernel: 21:22:13:30.715 cpu12:4124)WARNING: FS3: 6509: Reservation error: SCSI reservation conflict Sep 1 13:04:36 mdcc01h09r242 vmkernel: 21:22:13:35.761 cpu9:4124)WARNING: FS3: 6509: Reservation error: SCSI reservation conflict
We had a SCSI reservation issue that was locking out the LUN from any of the ESX hosts. The immediate suspect was the VCB host as it was the only other host that was being presented the same datastores (LUNs) as the ESX hosts from the SAN (HP EVA 6000).
We rebooted the VCB host and then issued the following command to reset the LUN from one of the ESX hosts;
vmkfstools -L lunreset /vmfs/devices/disks/naa.600508b4000547cc0000b00001540000
After issuing the LUN reset we observed the following message in the log;
Sep 1 13:04:40 mdcc01h10r242 vmkernel: 0:00:26:07.400 cpu9:4209)WARNING: NMP: nmp_DeviceTaskMgmt: Attempt to issue lun reset on device naa.600508b4000547cc0000b00001540000. This will clear any SCSI-2 reservations on the device.
The ESX hosts were almost immediately able to see the datastore and the problem was resolved.
We believe the problem occurred when the VCB host tried to backup multiple virtual machines from the same datastore (LUN) at the same time. This created an issue when the VCB host locked the LUN for too long causing the SCSI queue to fill-up on the ESX hosts. This is new to us and to me so we’re still trying to figure it out.
Cheers!
References;
http://kb.vmware.com/kb/1009899
http://www.vmware.com/files/pdf/vcb_best_practices.pdf









