Working in the Hot Aisle: Power and Cooling

Part 1 of this saga can be found here.

An hour of battery backup is plenty of time to shut down a dozen servers… until it isn’t.  The last thing you want to see is that clock ticking down while a couple thousand Windows virtual machines decide to install updates before shutting down.

We were fortunate that one of the offices we were consolidating was subleased from a manufacturer who had not only APC in-row chillers that they were willing to sell, but a lightly used generator.  Between those and a new Symmetra PX UPS, we were on our way to breathing easy when the lights went out.  The PX provides several hours of power and is backed by the generator, which also backs up the chillers in the event of a power outage.  The PX is a marvel of engineering, but is also a single point of failure.  We witnessed this firsthand with an older Symmetra LX, which had a backplane failure a couple of years previous that took down everything.  With that in mind, we opted to go with two PDUs in each server cabinet: one that fed from the UPS and generator, and one that fed from city power with a massive power conditioner in front of it.  These circuits also extend into the IDFs so that building-wide network connectivity stays up in the event of a power issue.

Most IT equipment comes with redundant power supplies, so splitting the load is easy: one power supply goes to each PDU.  For the miscellaneous equipment with a single power supply, an APC 110V transfer switch handles the switching duties.  A 1U rackmount unit, it is basically a UPS with two inputs and no batteries, and it seamlessly switches from one source circuit to the other when a voltage drop is detected.

As mentioned, cooling duties are handled by APC In-Row chillers, two in each aisle.  They are plumbed to rooftop units and are backed by the generator in case of power failure.  Temperature sensors on adjacent cabinets provide readings that help them work as a group to optimize cooling, and network connectivity allow monitoring using SNMP and/or dedicated software.  Since we don’t yet need the cooling power from all four units, we will be programming the units to run on a schedule to balance running hours between the units.

Cooling in the IDFs is handled by the building’s chiller, with an independent thermostat-controlled exhaust fan as backup. As each IDF basically hosts just one chassis switch, cooling needs are easily handled in this manner. As users are issued laptops that can ride out most outages, we were able to sidestep having to provide UPS power to work areas.

Next time:  Keeping the hot hot and the cool cool.

Working In the Not-Yet-Hot Aisle

“This is no longer a vacation.  It’s a quest.  It’s a quest for fun.” -Clark W. Griswold

The 48U enclosures and the in-row CRACs are in place and bolted together, but there’s no noise except the shrill shriek of a chop saw in the next room.  Drywall dust coats every surface despite the daily visits of the kind folks with mops and wet rags.  The lights overhead are working, but the three-cabinet UPS and zero-U PDUs are all lifeless and dark.

Even in this state, the cabinets are being virtually filled.  In a recently stood-up NetBox implementation back at HQ, top-of-rack switches are, contrary to their name, being placed in the middle of the enclosures.  Servers are being virtually installed, while in the physical world, blanking panels are being snapped into place and patch panels are installed by the cabling vendor, leaving gaping holes where there will soon be humming metal boxes with blinkenlights on display.  Some gaps are bridged with blue painters’ tape, signifying with permanent marker the eventual resting place of ISP-provided equipment.

During the first couple of weeks after everything is bolted down, we’re pretty much limited to planning and measuring because of the room is packed with contractors running Cat6, terminating fiber runs, plumbing the CRACs, putting batteries in the UPS, connecting the generator, wiring up the fire system–it’s barely controlled chaos.  Within a couple of weeks, the pace slows a bit; it’s still a hardhat-required zone, but the fiber runs to the IDFs are done and being tested, patch panels are being terminated and the fire system has long since been inspected and signed off by the city.  A couple of more weeks and we know it’s time to get serious when the sticky mat gets stuck down inside the door, the hardhat rules are rescinded and the first CRACs are fired up.

Thus begins the saga of a small band of intrepid SysAdmins working to turn wrinkled printouts, foam weatherstripping, hundreds of cage nuts, blue painter’s tape and a couple hundred feet of Velcro into a working data center.  This marks the first time I’ve worked in a hot-aisle/cold-aisle data center, much less put one together.  This is something I’ve wanted to do for years, but there’s remarkably little detailed information on the web about this process; the nitty gritty of data center design and construction is usually delegated to consultants who like to keep their trade a closely-guarded secret, and indeed, we consulted with a company on the initial design and construction of our little box of heaven.

The concept of hot-aisle/cold aisle containment is pretty straightforward and detailed in hundreds of white papers on the Internet: server and network equipment use fans to pull cool air in one side of the unit and blow heat out the other.  Therefore, if you can turn your data center into two compartments, one that directs all of the cooled air from your A/C into the cold intake side of the equipment, and one that directs all of the heated air from your equipment back into the A/C return, you increase the efficiency and reduce the costs of running your A/C, and you keep hot exhaust air from returning to the intake of the equipment.  Ultimately, if done right, you can turn up the temperature in the cold aisle, further reducing your costs, as you don’t have “hot spots” where equipment is picking up hotter exhausted air.  The methods for achieving this vary greatly.

And more importantly, it turns out that there are some caveats that can either increase that initial cash outlay significantly or reduce overall efficiency.

Stay tuned as I dig into the details of this new project.