The Littlest Datacenter Part 4: Environmental and Security

You can find part 1 of this saga, including the backstory, here.

Strip malls and low-rent office/retail buildings present a number of challenges to the fledgling IT-focused company.  From poor electrical infrastructure to lack of security to lack of options for broadband, the small business IT guy has his work cut out for him.  Add to that the fact that management typically chooses space based on price, and you’re lucky if you even get the address before moving day, much less a voice in the selection process.

And so I found myself cramming 100TB of disk into a closet large enough for a 42U cabinet and a 2-post.  And when I say large enough, I mean JUST large enough.  I was able to convince them to put a wide enough door that I could maneuver the cabinet out of its wheels so I could get behind it.  To add to the fun, this closet was in the office space, so noise was a factor.

Fortunately, I had a say in how the room was built.  My goal in building the room was to maximize the use of space, keep the noise and cold air in, and provide a modicum of security; after all, sensitive equipment would be in this room.  To that end, the builder put fiberglass insulation into the walls, and then doubled the drywall on the outside facing the office. The inside walls were drywalled and then covered in plywood lagged to the studs, providing a strong base for mounting wall-based equipment and further reducing sound.  The roof was a lid consisting of solid steel sandwiched between plywood above and drywall below.  A steel door with a push button combination lock and steel frame completed the room.

As a warehousing operation, shipping was the most important function of the business.  Delayed shipments could result in financial penalties.  Since shipping was a SaaS function, my goal was to provide the business with the power and Internet so they complete the bulk of the day’s shipping even in the event of a power outage.  Installation of a generator was impossible due to the location, so I had to settle for batteries.  I ended up with five UPSes in total.  One Tripp Lite 3kVA and one APC 3kVA split the duties in the server cabinet, and one Tripp Lite 3kVA UPS kept the 2-post (with PoE switch for the cameras) and wall-mounted equipment alive.  I also had a 1,500VA unit at each pair of shipping tables to power the shipping stations (Dell all-in-ones) and label printers.  Additional battery packs were added to each unit so that a total uptime of about five hours could be achieved.  That gave plenty of time to either finish the shipping day or make a decision about renting a portable generator for longer outages.  So far, there has been only one significant outage during a shipping day, but the production line was able to work through it without a hitch.

Cooling for the server room was provided by a Tripp Lite SRCOOL12K portable air conditioner.  The exhaust was piped into the area above the drop ceiling.  While this did the job, I would have preferred a variable inverter drive unit with dual hoses for more efficiency.  We investigated a mini-split, but due to property management requirements, it would have taken months and cost many thousands of dollars.  The server equipment could go for well over an hour before heat buildup began to become an issue, which was enough time to open the door and use a fan.  Equipment could also be shut down remotely, further reducing heat production.

In addition to the physical security, infrastructure security had to be considered as well.  To that end, I deployed a physically separate network for the surveillance, access control and physical security systems.  Endpoints ran with antivirus and GPO-enforced firewalls and auto-patching.  Ninite Pro took care of keeping ancillary software up to date.  As all of the company equipment was wired, the wireless network was physically segmented from the rest of the network for BYOD and customer use.  pfBlocker was deployed on the pfSense firewalls to block incoming and outgoing traffic to countries where we did not do business, and outbound traffic was limited initially to ports 80 and 443, with additional ports added on an as-needed basis.  Finally, I deployed Snort on the firewalls themselves and in various VMs to catch any intrusions if they happened.

Coming up: Monitoring and lessons learned.

 

The Littlest Datacenter Part 3: Backup

Part 1 of this saga can be found here.

I had an interesting backup conundrum.  I had 26TB of utterly incompressible and deduplication-proof surveillance video data that needed to be backed up.  As this video closely recorded the fulfillment operation and was used to combat fraud by “customers”, it needed to be accessible for 90 days after recording.  Other workloads that needed to be backed up included the infrastructure (AD/DNS/DHCP) servers, a local legacy file server, the PBX, and a few other miscellaneous VMs.  Those, however, totaled less than 2TB and were easily backed up using a multitude of options.

The video data was difficult to cloudify.  Just the initial 26TB was massive, but it also changed at a rate of about 50GB per hour during shipping hours.  And in case of actual hardware failure, getting that 26TB back and into production was equally difficult.

This combined with a limited budget led me to a conclusion I didn’t want to reach: tape.  I needed to get a copy of the data out of the server room, and having a 52TB array sitting under somebody’s desk wasn’t appealing.  One thing I did know was that management told me that an offsite copy wasn’t necessary; in the event of a full-site disaster (fire, earthquake, civil unrest), the least of our worries would be defending shipments that had already been made.  With that in mind, I decided to go with a disk-to-disk-to-tape option.

When it comes to buying server hardware with massive amounts of disk, I generally look at Super Micro.  I had been quoted a few Dell servers with 4TB drives, but the cost was generally breathtaking.  I wanted a relatively lightly powered box with a ton of big drives at a reasonable price, and I got it.  I picked up a new 16-bay box with a single 8-core CPU, an LSI RAID controller and 64GB of RAM for cheaper than used Dell gear.  For drives, I went with 6TB enterprise-class SAS drives: four from WD, four from HGST, four from Seagate and four from Toshiba.  I configured these drives in a RAID-60 with two of each model of drive in each half of the RAID-60.  That way, if I had a bad batch of Seagate drives (which NEVER happens, of course), I could lose all four and still have a running, if degraded, array.  This RAID arrangement gave me about 72TB usable–enough for two full backups and a number of incrementals.

For tape duties, I picked up a new-old stock Dell PowerVault PV124.  This LTO-5 SAS changer was chosen at a time when the initial build called for only 16TB for video.  Holding 16 tapes, its raw uncompressed capacity was about 24TB, and it eventually was using all 16 tapes for a single backup.

Veeam was chosen to handle the backup duties, because at the time, it was the only solution I could find that could do VM-level backups AND handle SAS tape libraries.  Backups were made nightly to the local proxy, with full copies to tape occurring weekly.  The tapes were then stored in a fireproof safe in a steel-and-concrete vault at the other end of the building.

Coming soon: Environmental, monitoring and security.

The Littlest Datacenter Part 2: Internet and Firewalls

Part 1 of this saga can be found here.

As mentioned before, this was a SaaS-focused business.  Most of the vital business functions, including ordering, shipping and receiving, pricing, accounting and customer service, were SaaS.  That meant that a rock-solid Internet connection was required.  But again, a small business runs on a small budget.  Combined with the fact that the business was in a strip mall, and we were lucky to get Internet at all.

Fortunately, we were able to get Fios for a reasonable cost and installed reasonably quickly.  Previously the business had been running IPCop on a tiny fanless Jetway PC, but I felt we had outgrown IPCop, and the Jetway box, though still working, was a bit underpowered for what I needed.  I settled on pfSense as my firewall of choice, but I didn’t want to run it on desktop hardware.

Fortunately, Lenovo had a nearly perfect solution for my budget: the RS140 server.  It was a 1U rackmount server with a four-core Xeon E3 processor with AES-NI for fast crypto, and it came with 4GB of RAM for a hair over $400.  The price was so good I bought two.  Each I fitted out with an additional 4GB of RAM and two SSDs, a 240GB from SanDisk and a 240GB from Intel.  There was a bit of consternation when I found that the server came with no drive trays, but I found that I could mount the SSDs in 3.5″ adapters and mount them directly into the chassis with no drilling.

The SanDisk and Intel SSDs in each server were configured in software RAID-1 using the onboard motherboard RAID, and the integrated IPMI was finicky but good enough that I could remotely KVM into the boxes if need be.  The servers were then configured into an active/passive pair using the pfSense software, and I used a new HPe 8-port switch to connect them to the Fios modem.

The firewalls worked so well I bought a matching pair for the other location and connected them with an IPSec tunnel so they could share files more securely.

You may ask why I used hardware for the firewalls instead of virtualizing them.  The answer is, I initially did virtualize them in Hyper-V.  However, I just wasn’t comfortable with the idea of running my firewalls on the same hardware as my workloads.  There have been rumors of ways to escape a VM and compromise the host, and indeed recent revelations about hypervisor compromise through bad floppy drivers and side channel data leakage a la Spectre and Meltdown have confirmed my suspicions about virtualized firewalls.

Coming soon: Backup, environmental, monitoring and security.

The Littlest Datacenter Part 1: Compute and Storage

I was tasked with building a datacenter.  Okay, not really.  The company was expanding into a low-cost strip mall, which meant limited connectivity options, no power redundancy and strict rules regarding modifications.  It also meant that I was limited to two racks in a tiny closet in the middle of an office space.  Finally, as always, there was minimal budget.

The Requirements

The COO was very SaaS-focused for business applications.  As the sole IT person (with additional ancillary duties), I was happy to oblige.  File storage, office applications, email, CRM, shipping and accounting functions were duly shipped off to folks who do that kind of thing for a living, leaving me with a relatively small build: AD/DNS/DHCP, phone system and surveillance.  While the systems I was replacing used independent servers that replicated VMs across, it was a decidedly more… manual failover process than I wanted.  As a shipping-focused business that was penalized for missing shipping deadlines, systems needed to be redundant and self-healing to the extent possible within the thin budget.  Finally, I knew that I would eventually be handing off the environment to either a managed service provider or a junior admin, so everything needed to be as simple and self-explanatory as possible.

The infrastructure VM (AD, DNS, etc.) and ancillary VMs were pretty straightforward.  The elephant in the room was the surveillance system.  Attached to 27 high-resolution surveillance cameras, it would have to store video for 90 days for most of the cameras for insurance reasons.  Once loaded with 90 days of video, it would consume 26TB of disk space and average about 50GB/hour of disk churn during business hours.

The Software

Because of costs, I settled on Hyper-V as my VM solution.  As it’s included with the Windows licenses I was already buying, it was cost-effective, and it had live migration, storage migration, backup APIs, remote replication and failover capabilities.  Standard licensing allows two Windows VMs to run on one license, further reducing costs.

Next to consider was the storage solution.  As I mentioned, the existing server pair consisted of two independent Hyper-V systems, with one active and one passive.  Hyper-V replication kept the passive host up to date, but in the event of a failure or maintenance, failing over and failing back was a long and arduous process.  I opted for shared storage to allow HA.  Rather than roll my own shared storage, I decided to buy.

After talking with several vendors, I settled on Starwind vSAN.  I had used their trialware with good results, and it had good reviews from people who had chosen it.  As it ran on two independent servers with independent copies of the data, it protected both from disk failure and host, backplane, operating system, RAID controller and motherboard failure.  Starwind sold a virtual appliance which was an OEM-branded but very familiar Dell T630 tower server, so I ordered two, which was substantially cheaper than sourcing the servers and vSAN software separately, and about a sixth of the cost of an equivalent pair of Dell servers and separate SAN.

The Hardware

I settled on a pair of midrange Xeons with 12 cores each–24 cores or 48 threads per host.  This was enough to process video on all of the cameras while leaving plenty of overhead for other tasks.  The T630 is an 18-bay unit with a rack option.  Dual gigabit connections went to the dedicated camera switch, while another pair went to the core switches.  For Starwind, a dual-port 10 gigabit card was installed in each host.  One port on each was used for Starwind iSCSI traffic, and the other for Starwind sync traffic.  Both were redundant in software, and they were direct-connected between the hosts with TwinAx.  Storage for each host consisted of sixteen 4TB Dell drives and two 200GB solid state drives for Starwind’s caching.

In an effort to reduce complexity, I went with a flat network.  Two HPe switches provided redundant gigabit links to the teamed server NICs and the other equipment in the rack.  Stacked and dual-uplinked HPe switches connected the workstations and ancillary equipment to the core.

The Software

Windows Server 2012R2 standard provided the backbone, with Starwind vSAN running on top.  Two Windows VMs powered the AD infrastructure server and the surveillance recording server.  I later purchased an additional Windows license and built a second DC/DNS/DHCP VM running on the second host.

Coming Soon:  Firewalls, backup, environmental, monitoring and security

 

The Datastores That Would Not Die

As part of a recent cleanup of our vSphere infrastructure, I was tasked with removing disused datastores from our Nimble CS500.  The CS500 had been replaced with a newer generation all-flash Nimble, and the VMs had been moved a couple of months ago.  Now that the new array was up and had accumulated some snapshots, I was cleaning up the old volumes to repurpose the array.  I noticed, however, that even though all of the files were removed from the datastores, there were still a lot of VMs that “resided” on the old volumes.

VMware addresses this in a KB (2105343) titled “Virtual machines show two datastores in the Summary tab when all virtual machine files are located one datastore.” It suggests that the VM is pointing to an ISO that no longer exists on the old datastore.

After looking at the configs, I realized that, sure enough, some of the VMs were still pointing to an ISO file that was no longer on that datastore.  Easy, right, except that on one of the test VMs, I set the optical drive setting back to “Client Device.”  It was still pointing at the old datastore.

Looking through the config again, I noticed that the Floppy Drive setting is missing from the HTML5 client.  I fired up the Flex client and set the floppy drive to “Client Device” as well.  Still no go.  For the few VMs that were pointing at a nonexistent ISO, setting the optical drive back to “Client Device” worked, but for VMs that were pointing at a nonexistent floppy, changing the floppy to “Client Device” wasn’t working.  A bug in the floppy handling?  Perhaps.

I created a blank floppy image on one of my new datastores and pointed the VM’s floppy to that new image.  Success!  The VM was no longer listing the old datastore, and I could then set the floppy to “Client Device.”  After checking out other VMs, I realized that I had over 100 VMs that had some combination of optical drive or floppy drive pointing at a non-existent file on the old datastores.  PowerCLI to the rescue!

$vm = $args[0]

$cd = get-cddrive -vm $vm
floppy = get-floppydrive -vm $vm

set-floppydrive -Floppy $floppy -FloppyImagePath "[datastorename] empty-floppy.flp" -StartConnected:$false -Confirm:$false

set-floppydrive -Floppy $floppy -NoMedia -Confirm:$false
set-cddrive -cd $cd -NoMedia -Confirm:$false

Simply save this as a .ps1 file and pass it the name of the VM (in quotes if it contains spaces.) It will get the current floppy and CD objects from the VM, set the floppy to the blank floppy image previously created, and then set both the CD and floppy to use “NoMedia.” This was a quick and dirty script, so you will have to install PowerCLI and do your own Connect-VIServer first. Once connected, however, you can either manually specify VMs one at a time or modify the script to get VM names from a file or from vCenter itself.  All of these settings can be changed while the VM is running, so there is no need to schedule downtime to run this script.

After all of this work, I found that there were still a few VMs that were showing up on the old datastores.  Another quick Google search revealed that any VMs that had snapshots that were taken with the CD or floppy mounted would still show up on that datastore.  Drat!  After clearing out the snapshots, I finally freed up the datastores and was able to delete them using the Nimble Connection Manager.

So now a little root cause analysis: why were there so many machines with a nonexistent CD and floppy mounted?  After seeing that they were all Windows 2016 VMs, I went back to our templates and realized that the tech who built the 2016 template left the Windows ISO (long since deleted) and floppy image (mounted so he could F6-load the paravirtualized SCSI driver during OS installation) mounted when he created the template.  I converted the template to a VM, removed the two mounts (using the same two-step method for the floppy) and converted it back to a template.

With that job done, I’m continuing to plug away at converting our other vCenters from 6.5 to 6.7U1.  Have a great day, everyone!