As I’ve mentioned in the past we’re kicking off a very large endeavor to move a significant number of our servers to a virtual environment. Over the past two weeks we built out an HP 7000 enclosure with 4 HP BL460c server blades, 6 HP Virtual Connect 1/10 Gb-F Ethernet interconnects, and 2 HP Virtual Connect 8Gb 24-Port Fiber Channel interconnects. The purpose of this hardware is to provide a temporary staging location as we perform the physical to virtual conversions before moving the virtual machines (across the network) to a new data center where additional VMware vSphere 4 clusters will be waiting.
We had some small issues when we first turned up the enclosure but the biggest hurdle was our unfamiliarity with Virtual Connect and locating the default factory passwords (we had ordered the enclosure as a special build so it came pre-assembled which saved us a lot of time and effort and was well worth the small added cost).
We’re currently using two Nortel Ethernet Routing Switch 5530s in a stack configuration mounted at the top of the rack. We also have a Nortel Redundant Power Supply Unit (RPSU) 15 installed to provide redundant power in the event that we loose a one of the rooms UPS’s or that we have an issue with an internal power supply in either ERS 5530 switch. We load software release 6.1 onto the ERS 5530s and so far haven’t observed any issues. We’re initially connecting the ERS 5530 stack via 2 1000BaseSX (1Gbps) uplinks distributed across both ERS 5530 switches (DMLT) to a pair of Nortel Ethernet Routing Switch 8600s running in a cluster configuration using IST/SMLT(SLT) trunking. As the solution grows we can expand the uplink capacity by adding additional 1Gbps uplinks or by installing 10Gbps XFPs. We’re downlinking from the ERS 5530 stack to multiple HP Virtual Connect 1/10Gb-F modules using LACP. Unfortunately you can’t have a LAG span multiple HP Virtual Connect 1/10Gb-F Ethernet modules as this time. If you do, only the ports on one of the modules will be “Active” while the ports on other modules will by in “Standby”.
The HP Virtual Connect 1/10 Gb-F Ethernet interconnects provide 16 Internal 1Gb Downlinks, 4 External 10/100/1000BASE-T Uplinks, 2 External 10/100/1000BASE-T SFP Uplinks, 2 External 10Gb XFP Uplinks, 1 External 10Gb CX-4 Uplinks, and 1 10Gb Internal Cross Connect. Using the internal 10Gbps cross connect along with the external 10Gb CX-4 uplink you can create a 10Gbps network within the enclosure. You can also link multiple enclosures together to form a 10Gbps network contained entirely within the rack. This could be very beneficial in keeping vMotion and other unneeded traffic off the core uplinks.
In testing we did run into a significant problem that already appears to have been documented by HP although a solution is yet to be formulated. In testing several failure scenarios (physically removing the HP Virtual Connect Ethernet interconnects or remotely powering them down) we observed a significant problem when the interconnects where restored. The HP Virtual connect 1/10Gb-F would show no link to the blade server while the VMware 4 console would indicate that there was link. This problem obviously affected all traffic associated with that port group. The solution was to either reboot the VMware host or reset the NIC using ethtool -r {NIC} from the server console.
Here’s the excerpt from the release notes;
When a VC-Enet module is running VC v1.30 firmware or higher, a link might not be re-established
between the module and the ports of an NC364m mezzanine card under the following conditions:
- The network mappings are changed on the NIC ports through Virtual Connect.
- The VC-Enet module is restarted from a power-cycle or reboot, or the module is removed and inserted.
If the server is rebooted, the link is established on all ports on both sides of the connection. Manually
toggling the link from the server should also restore the link.
The jury is still out on HP’s Virtual Connect although I hope to dig deeper in later posts.
Cheers!
References;
http://h18004.www1.hp.com/products/blades/components/c-class-interconnects.html
http://bizsupport1.austin.hp.com/bc/docs/support/SupportManual/c01730696/c01730696.pdf
John Naysmith says
Have you considered using beacon probing in ESX for upstream link detection failure?
This would possibly resolve the fact that the midplane connections consider themselves connected, even with uplinks disconnected.
I’d also be interested in the firmware version of VC and seeing how you have configured smartlink in your Virtual connect E-net definitions, as this can also impact whether the blade detects an upstream failure.
We have 40 + enclosures with this config and will be testing upgrading to vsphere soon, so i expect to share your pain..
cheers.
Michael McNamara says
Hi John,
We haven’t tested beacon probing yet although it is on the long list of things we need to research and test. We have yet to test the SmartLink functionality at all although we are interested in duplicating the functionality that we are accustomed to in Nortel’s VLACP. We are running Virtual Connect v2.10, the latest and greatest although we did upgrade it when it arrived from the factory.
We believe the problem stems from the Intel based chipset used on the mezzanine card (NC360m,BLc 2-PORT GIGABIT,MEZZ ADPTR). We had no problems with the built-in NIC that come with the blade although we believe those are based on a Broadcom chipset. We’ve opened a ticket with HP and the only solution is to RMA the mezzanine adapters for a model that has a Broadcom chipset. We should receive the RMA by tomorrow so I’ll keep you posted.
On another note, 40+ enclosure is a huge number of virtual servers. I would guess that we could easily fit 500 guests on 16 BL460c blades in a single enclosure. If you have 40 enclosures that’s easily over 20,000 guests! I certainly wouldn’t want to see the electric bill for 40+ enclosures although it really makes you think about scaling.
Thanks for the comment!