Tag Archives: NEXUS

Secondary Data Center – Where have I been?

It was just over 2 years ago that I designed and stood up our first off-campus data center in Philadelphia, PA. Since that time we’ve completely vacated our original data center migrating all the servers, applications and services out to our new data center. Last month we relocated our offices leaving the old data center and office space behind forever. The new office space is very nice and has a lot of (very needed) conference rooms all of which have built-in audio/video capabilities with either an over-head projector or flat screen TV. I’m still hoping to have a LAN party someday on those 61″ monster displays perhaps with Call of Duty: Black Ops 2?

In June we started deploying our secondary data center with the intent of providing our own business continuity and disaster recovery services for our tier 1 applications including all our data storage needs. The design allows us the flexibility to utilize both DCs in an active/active configuration with the ability to move workloads (virtual machines) between DCs. While the design allows us that option we’re still testing how we’re going to handle all the different disaster scenarios – blade, enclosure, rack, SAN, cage, entire data center, etc. While our primary data center rings in at 800 sq ft our secondary data center is only 300 sq ft. This is possible because we’re utilizing a traditional disaster recovery model for our big box non-tier 1 applications that for one reason or another aren’t virtualized. This helps reduce the number of lazy assets hanging around and helps control some of the budget numbers. I totally expect the number of big box applications to continue to shrink over time as more and more application vendors embrace virtualization.

We’ve had pretty good success with the design of our first data center so we only made a few corrections. There’s a lot of logistics that need to be considered in any design especially around all the power and cooling requirements.

The Equipment

What equipment did we use? We already deployed Cisco at our primary data center so we decided to stay with Cisco at our secondary data center.

  • Cisco Nexus 7010
  • Cisco Nexus 5010
  • Cisco Nexus 2248
  • Cisco Nexus 1000V
  • Cisco Catalyst 3750X
  • Cisco Catalyst 2960G
  • Cisco ASA5520
  • Cisco ACE 4710
  • Cisco 3945 Router (Internet)
  • Cisco 2811 Router (internal T1 locations)

What racks did we use for the network equipment?

  • Liebert Knurr Racks
  • Liebert MPH/MPX PDUs

What equipment did we use for the servers/blades?

  • HP Rack 10000 G2
  • HP Rack PDU (AF503A)
  • HP IP KVM Console (AF601A)
  • HP BladeSystem C7000 Enclosure
  • HP Virtual Connect Flex-10 Interconnect
  • HP SAN 8Gb Interconnect
  • Cisco Catalyst 3120X
  • HP BL460c G7
  • HP BL620c G7
  • HP DL380 G8
  • HP DL360 G8

What are we using for storage?

  • IBM XIV System Storage Gen3 (SAN) (w/4 1Gbps iSCSI replication ports)
  • IBM SAN80B-4 SAN Switch
  • EMC DD860 (Disk-Disk backup via Symantec NetBackup)

Additional miscellaneous equipment;

  • MRV LX-4048T (terminal server)

We had some challenges with designing our secondary data center due to the density of our equipment. We had to stay under the maximum kw per sq foot load that the room (data center) was designed to handle. This is a simple calculation based on the kW utilization of the equipment to determine if there is adequate power and cooling available to meet that demand. We also had to maintain a N+1 design so we really can’t consuming more than 40% of our capacity leaving 10% for reserve. While some vendors charge a flat fee for the space (includes power) others charge per kWh so it’s very important to understand what type of demand you’re going to be placing on the data center.

My Design

We stood up a pair of Ciena 5200s from Zayo (formerly AboveNet) providing us a DWDM ring with 4 wavelengths between our primary data center and secondary data center . We’re using 2 wavelengths for the IP network between 2 pairs of Cisco Nexus 7010s and 2 wavelengths for the SAN fiber channel network between 2 pair of IBM SAN switches. We have the option of adding upwards of 4 additional wavelengths before we need to add any hardware so we have room for growth. The 4 wavelengths are diverse between an east and west path but they are not protected so it’s up to the higher layer protocols to provide the redundancy and failover.Not visible in the diagram above is a 10GE WAN ring that connects all our hospitals together. The primary and secondary data centers are also tied into that ring via multiple peering points for redundancy. You might be asking yourself why I’m using a Cisco 3750E as a termination switch in our primary data center. At the time we deployed our Cisco Nexus 7010s they didn’t support the 10GBase-ER SFP+ optic so I had to use the Cisco 3750E (with RPSU) as a glorified media transceiver/converter from 10GBase-ER to 10GBase-SR. The Cisco Nexus 7010 now has a 10GBase-ER SFP+ optic available so we didn’t need to use the Cisco 3750 in the secondary data center.

We are essentially stretching a Layer 2 vPC connection between the 2 data centers. It’s possible that some folks will get excited at the mention of Layer 2 between the data centers but it’s the best solution for us at this time and it certainly has pros and cons like everything in networking. We looked at potentially running OTV between the Cisco Nexus 7010s but ultimately decided to use a vPC configuration. We are only stretching the virtual machine VLANs that we need between the data centers.

My Thoughts

There’s a lot of work required to design any data center or even an ICR (Intermediate Communications Room), CCR (Central Communications Room), MDF (Main Distribution Frame) or IDF (Intermediate Distribution Frame). You’re immediately confronted with space, power and cooling challenges never mind coming up with the actual IP addressing scheme, VLAN assignments, routing vs bridging ,etc. You need to determine how much cabling you’ll need both CAT6 and fiber, perhaps you’ll look to use twinax of DAC (Direct Attach Copper) for your 10GE connections. Let’s not forget to include the ladder racks, basket trays, fiber conduits, PDUs, out-of-band networking, etc.

You also need to design the data center as if it was 300+ miles away… license those iLOs (HP Integrated Lights Out), purchase IP enabled KVMs, purchase console/terminal servers (Opengear or MRV) and wire everything up as if you will never have the opportunity to visit it again. We’ve had a few issues in the past few years that were quickly (less than 15 minutes) resolved thanks to having all our iLOs licensed, all our KVMs IP enabled, all our console/serial ports connected to a console/terminal server and the ability to dial-up into the console/terminal server should the problem get really bad.

Here’s a short story… We had a number of billing issues in the first few months of our contract with our current primary data center provider and the data from our Liebert PDUs, HP PDUs, and HP C7000 enclosures was invaluable in calling into question the numbers that were being reported to us. In all honesty when they told me we were consuming 53A on a 50A circuit I knew that something was grossly wrong with their math. In the end the provider admitted that there numbers were grossly wrong and the corrected numbers were in-line with the data we collected from our equipment.

It’s never a good idea to skimp on the documentation and I really advise taking lots of pictures, you’d be surprised how quickly you can forget what the back a specific rack looks like when you’re trying to walk Smart Hands through replacing a component at 2AM in the morning.

Cheers!

Cisco Nexus 3548 with Algorithm Boost – Hands-on

I was speaking with my Cisco sales engineer this past week and he was regaling me with tales of the Cisco Nexus 3548, Cisco’s latest low latency switch which is specifically targeted at the financial vertical, where the proliferation of algorithmic trading requires the lowest latency possible.

The low-latency space has really been heating up over the past few years. I believe the Cisco Nexus 3548 and specifically the Monticello ASIC has been in development for almost 2 years and was released at the same time as Arista’s 7150S. Unfortunately I don’t see myself running any of these anytime soon… I had to fight to get 10Gbps in my Data Centers and I usually have to convince customers to go with 1Gps to the desktop as opposed to 100Mbps.

Shaun had the opportunity to test one of only a few test/evaluation units available and was kind enough to compile a video for all those interested.

You’ll find plenty of information from Cisco and even from the renowned Ivan Pepelnjak and Brad Reese.

Thanks Shaun!

Cisco Nexus 7010 ISSU Upgrade to 5.2(4)

We just completed an upgrade of our core Cisco Nexus 7010s from 4.2(6) to 5.2(4). We followed the process laid out by the upgrade guide and performed an ISSU (In-Service-Software-Upgrade) since we have dual supervisors in both switches. While there was no service interruption during the upgrade the process did take about 45 minutes per switch (we had 7 cards in each chassis) so be sure to plan your maintenance window accordingly.

I did notice a very odd problem while trying to copy the software to the switch via TFTP – it was insanely slow. It was copying the software at what appeared to be 8-16kbps. I issued a Ctrl-C and tried an FTP download and it flew along and I was done in minutes. In both cases I utilized the default VRF. I’m curious to understand why the TFTP was so slow compared to the FTP. We utilize TFTP pretty heavily in our environment and we’ve never had a problem with any other equipment so I suspect the Cisco Nexus 7010s and not the CentOS Linux server that acts as our central TFTP server.

The next big hurdle will be finding the downtime to apply all the EPLD/FPGA firmware upgrades for each card. I understand the EPLD upgrade is disruptive but I’m trying to determine how big a maintenance window I need in order to safely accomplish the task on both core Cisco 7010s – doing them one at a time. One Cisco resource I talked with said I would need a minimum 4 hour maintenance window per chassis – there’s no way in hell I’m going to get a four hour maintenance window in a healthcare environment.

Here are the commands and output in case anyone is curious or would like to compare notes.

sw-n7010-ccr.acme.org# install all kickstart bootflash:n7000-s1-kickstart. 5.2.4.bin system bootflash:n7000-s1-dk9.5.2.4.bin

Verifying image bootflash:/n7000-s1-kickstart.5.2.4.bin for boot variable "kickstart".
[####################] 100% -- SUCCESS

Verifying image bootflash:/n7000-s1-dk9.5.2.4.bin for boot variable "system".
[####################] 100% -- SUCCESS

Verifying image type.
[####################] 100% -- SUCCESS

Extracting "lc1n7k" version from image bootflash:/n7000-s1-dk9.5.2.4.bin.
[####################] 100% -- SUCCESS

Extracting "bios" version from image bootflash:/n7000-s1-dk9.5.2.4.bin.
[####################] 100% -- SUCCESS

Extracting "lc1n7k" version from image bootflash:/n7000-s1-dk9.5.2.4.bin.
[####################] 100% -- SUCCESS

Extracting "lc1n7k" version from image bootflash:/n7000-s1-dk9.5.2.4.bin.
[####################] 100% -- SUCCESS

Extracting "lc1n7k" version from image bootflash:/n7000-s1-dk9.5.2.4.bin.
[####################] 100% -- SUCCESS

Extracting "system" version from image bootflash:/n7000-s1-dk9.5.2.4.bin.
[####################] 100% -- SUCCESS

Extracting "kickstart" version from image bootflash:/n7000-s1-kickstart.5.2.4.bin.
[####################] 100% -- SUCCESS

Extracting "lc1n7k" version from image bootflash:/n7000-s1-dk9.5.2.4.bin.
[####################] 100% -- SUCCESS

Extracting "cmp" version from image bootflash:/n7000-s1-dk9.5.2.4.bin.
[####################] 100% -- SUCCESS

Extracting "cmp-bios" version from image bootflash:/n7000-s1-dk9.5.2.4.bin.
[####################] 100% -- SUCCESS

Performing module support checks.
[####################] 100% -- SUCCESS

Notifying services about system upgrade.
[####################] 100% -- SUCCESS

Compatibility check is done:
Module  bootable          Impact  Install-type  Reason
------  --------  --------------  ------------  ------
     1       yes  non-disruptive       rolling
     2       yes  non-disruptive       rolling
     3       yes  non-disruptive       rolling
     4       yes  non-disruptive       rolling
     5       yes  non-disruptive         reset
     6       yes  non-disruptive         reset
     7       yes  non-disruptive       rolling  

Images will be upgraded according to following table:
Module       Image                  Running-Version(pri:alt)           New-Version  Upg-Required
------  ----------  ----------------------------------------  --------------------  ------------
     1      lc1n7k                                    4.2(6)                5.2(4)           yes
     1        bios     v1.10.6(11/04/08):  v1.10.6(11/04/08)                                  no
     2      lc1n7k                                    4.2(6)                5.2(4)           yes
     2        bios     v1.10.6(11/04/08):  v1.10.6(11/04/08)                                  no
     3      lc1n7k                                    4.2(6)                5.2(4)           yes
     3        bios     v1.10.6(11/04/08):  v1.10.6(11/04/08)                                  no
     4      lc1n7k                                    4.2(6)                5.2(4)           yes
     4        bios     v1.10.6(11/04/08):  v1.10.6(11/04/08)                                  no
     5      system                                    4.2(6)                5.2(4)           yes
     5   kickstart                                    4.2(6)                5.2(4)           yes
     5        bios     v3.19.0(03/31/09):  v3.19.0(03/31/09)                                  no
     5         cmp                                    4.2(1)                5.2(4)           yes
     5    cmp-bios                                  02.01.05              02.01.05            no
     6      system                                    4.2(6)                5.2(4)           yes
     6   kickstart                                    4.2(6)                5.2(4)           yes
     6        bios     v3.19.0(03/31/09):  v3.19.0(03/31/09)                                  no
     6         cmp                                    4.2(1)                5.2(4)           yes
     6    cmp-bios                                  02.01.05              02.01.05            no
     7      lc1n7k                                    4.2(6)                5.2(4)           yes
     7        bios     v1.10.6(11/04/08):  v1.10.6(11/04/08)                                  no

Additional info for this installation:
--------------------------------------

Do you want to continue with the installation (y/n)?  [n] y

Install is in progress, please wait.

Syncing image bootflash:/n7000-s1-kickstart.5.2.4.bin to standby.
[####################] 100% -- SUCCESS

Syncing image bootflash:/n7000-s1-dk9.5.2.4.bin to standby.
[####################] 100% -- SUCCESS

Setting boot variables.
[####################] 100% -- SUCCESS

Performing configuration copy.
[####################] 100% -- SUCCESS

Module 1: Refreshing compact flash and upgrading bios/loader/bootrom.
Warning: please do not remove or power off the module at this time.
[####################] 100% -- SUCCESS

Module 2: Refreshing compact flash and upgrading bios/loader/bootrom.
Warning: please do not remove or power off the module at this time.
[####################] 100% -- SUCCESS

Module 3: Refreshing compact flash and upgrading bios/loader/bootrom.
Warning: please do not remove or power off the module at this time.
[####################] 100% -- SUCCESS

Module 4: Refreshing compact flash and upgrading bios/loader/bootrom.
Warning: please do not remove or power off the module at this time.
[####################] 100% -- SUCCESS

Module 5: Refreshing compact flash and upgrading bios/loader/bootrom.
Warning: please do not remove or power off the module at this time.
[####################] 100% -- SUCCESS

Module 6: Refreshing compact flash and upgrading bios/loader/bootrom.
Warning: please do not remove or power off the module at this time.
[####################] 100% -- SUCCESS

Module 7: Refreshing compact flash and upgrading bios/loader/bootrom.
Warning: please do not remove or power off the module at this time.
[####################] 100% -- SUCCESS
2012 Mar 21 04:52:52 sw-n7010-ccr %$ VDC-1 %$ %PLATFORM-2-MOD_REMOVE: Module 5 removed (Serial number JAXXXXXXXXX)
2012 Mar 21 04:58:30 sw-n7010-ccr %$ VDC-1 %$ %CMPPROXY-STANDBY-2-LOG_CMP_UP: Connectivity Management processor(on module 5) is now UP

Module 5: Waiting for module online.
 -- SUCCESS
2012 Mar 21 04:59:42 sw-n7010-ccr %$ VDC-1 %$ %IDEHSD-STANDBY-2-MOUNT: logflash: online 

Notifying services about the switchover.
[####################] 100% -- SUCCESS

"Switching over onto standby".
 writing reset reason 7, SAP(93): Swover due to install

NX7 SUP Ver 3.19.0
Serial Port Parameters from CMOS
PMCON_1: 0x200
PMCON_2: 0x0
PMCON_3: 0x3a
PM1_STS: 0x101
Performing Memory Detection and Testing
Testing 1 DRAM Patterns
Total mem found : 4096 MB
Memory test complete.
NumCpus = 2.
Status 61: PCI DEVICES Enumeration Started
Status 62: PCI DEVICES Enumeration Ended
Status 9F: Dispatching Drivers
Status 9E: IOFPGA Found
Status 9A: Booting From Primary ROM
Status 98: Found Cisco IDE
Status 98: Found Cisco IDE
Status 90: Loading Boot Loader
                                                                                                                                                                 Reset Reason Registers: 0x1 0x0
 Filesystem type is ext2fs, partition type 0x83

              GNU GRUB  version 0.97 

Autobooting bootflash:/n7000-s1-kickstart.5.2.4.bin bootflash:/n7000-s1-dk9.5.2.4.bin...
 Filesystem type is ext2fs, partition type 0x83
Booting kickstart image: bootflash:/n7000-s1-kickstart.5.2.4.bin.......................
............................................................................Image verification OK

ÿ
INIT: version 2
Checking all filesystems..r.r.r.. done.
Loading system software
/bootflash//n7000-s1-dk9.5.2.4.bin read done
Uncompressing system image: bootflash:/n7000-s1-dk9.5.2.4.bin Wed Mar 21 05:03:26 EDT 2012
blogger: nothing to do.

..done Wed Mar 21 05:03:30 EDT 2012
Load plugins that defined in image conf: /isan/plugin_img/img.conf
Loading plugin 0: core_plugin...
num srgs 1
0: swid-core-supdc3, swid-core-supdc3
num srgs 1
0: swid-supdc3-ks, swid-supdc3-ks

INIT: Entering runlevel: 3

User Access Verification
SW-N7010-CCR(standby) login:

Now we moved over to the standby supervisor which was slot 5 at the time to obverse the upgrade complete;

Continuing with installation, please wait

2012 Mar 21 04:58:30 sw-n7010-ccr %$ VDC-1 %$ %CMPPROXY-2-LOG_CMP_UP: Connectivity Management processor(on module 5) is now UP

Module 5: Waiting for module online.
 -- SUCCESS

2012 Mar 21 04:59:42 sw-n7010-ccr %$ VDC-1 %$ %IDEHSD-2-MOUNT: logflash: online 

2012 Mar 21 04:59:47 sw-n7010-ccr %$ VDC-1 %$ Mar 21 04:59:47 %KERN-2-SYSTEM_MSG: Switchover started by redundancy driver - kernel

2012 Mar 21 04:59:47 sw-n7010-ccr %$ VDC-1 %$ %SYSMGR-2-HASWITCHOVER_PRE_START: This supervisor is becoming active (pre-start phase).

2012 Mar 21 04:59:47 sw-n7010-ccr %$ VDC-1 %$ %SYSMGR-2-HASWITCHOVER_START: Supervisor 5 is becoming active.

2012 Mar 21 04:59:49 sw-n7010-ccr %$ VDC-1 %$ %SYSMGR-2-SWITCHOVER_OVER: Switchover completed.

2012 Mar 21 05:00:00 sw-n7010-ccr %$ VDC-1 %$ %USER-2-SYSTEM_MSG: Load time of /isan/etc/routing-sw/cli/metro.cli_: 695539ms - ascii_cfg_server

2012 Mar 21 05:04:52 sw-n7010-ccr %$ VDC-1 %$ %CMPPROXY-STANDBY-2-LOG_CMP_UP: Connectivity Management processor(on module 6) is now UP

2012 Mar 21 05:06:00 sw-n7010-ccr-b %$ VDC-1 %$ %IDEHSD-STANDBY-2-MOUNT: logflash: online 

Module 1: Non-disruptive upgrading.
[####################] 100% -- SUCCESS

Module 2: Non-disruptive upgrading.
[####################] 100% -- SUCCESS

Module 3: Non-disruptive upgrading.
[####################] 100% -- SUCCESS

Module 4: Non-disruptive upgrading.
[####################] 100% -- SUCCESS

Module 7: Non-disruptive upgrading.
[####################] 100% -- SUCCESS

Module 5: Upgrading CMP image.
Warning: please do not reload or power cycle CMP module at this time.
[####################] 100% -- SUCCESS

Module 6: Upgrading CMP image.
Warning: please do not reload or power cycle CMP module at this time.
[####################] 100% -- SUCCESS

Recommended action::
"Please reload CMP(s) manually to have it run in the newer version.".

Install has been successful.

With the upgrade complete the only thing we needed to-do was to restart the CMPs on slots 5 and 6;

Cisco Nexus Operating System (NX-OS) Software
TAC support: http://www.cisco.com/tac
Copyright (c) 2002-2012, Cisco Systems, Inc. All rights reserved.
The copyrights to certain works contained in this software are
owned by other third parties and used and distributed under
license. Certain components of this software are licensed under
the GNU General Public License (GPL) version 2.0 or the GNU
Lesser General Public License (LGPL) Version 2.1. A copy of each
such license is available at
http://www.opensource.org/licenses/gpl-2.0.php and

http://www.opensource.org/licenses/lgpl-2.1.php

sw-n7010-ccr.acme.org# reload cmp module 5
This command will reload the CMP on the supervisor in slot 5.  Continue (y/n)?  [no] y

sw-n7010-ccr.acme.org# 2012 Mar 21 05:23:41 sw-n7010-ccr-b %$ VDC-1 %$ %CMPPROXY-2-LOG_CMP_WENT_DOWN: Connectivity Management processor (on module 5) went DOWN
2012 Mar 21 05:24:50 sw-n7010-ccr %$ VDC-1 %$ %CMPPROXY-2-LOG_CMP_UP: Connectivity Management processor(on module 5) is now UP

sw-n7010-ccr.acme.org# reload cmp module 6
This command will reload the CMP on the supervisor in slot 6.  Continue (y/n)?  [no] y

sw-n7010-ccr.acme.org# 2012 Mar 21 05:25:50 sw-n7010-ccr-b %$ VDC-1 %$ %CMPPROXY-STANDBY-2-LOG_CMP_WENT_DOWN: Connectivity Management processor (on module 6) went DOWN
2012 Mar 21 05:26:57 sw-n7010-ccr %$ VDC-1 %$ %CMPPROXY-STANDBY-2-LOG_CMP_UP: Connectivity Management processor(on module 6) is now UP

Cheers!