The Littlest Datacenter Part 6: Lessons Learned

For the first post of this long saga, click here.

It’s been a year since I moved on from the company running on the Littlest Datacenter, and about two years since it was built.  As I mentioned, I built it to be as self-sufficient, flat, simple and maintainable as possible, first because I had duties beyond being the IT guy and dropping everything to hack on junk equipment wasn’t going to cut it; second because I was the only IT guy and I wanted to be able to take vacations and sleep through the night without the business falling apart; and third, because I knew that, regardless of whether I stayed with that company or not, the IT function would eventually be given to an MSP or a junior admin.

Looking back at the setup, here are some lessons learned:

Buy Supermicro carefully:  The default support Supermicro offers is depot repair.  That means you’re deracking your server, boxing it up and paying to ship it back to them for repair.  Repair can take anywhere from one to six weeks.  This sucks because Supermicro offers a lot of flexible and reliable hardware choices for systems that fall outside the mainstream.  For instance, my Veeam server fit sixteen 3.5″ hard drives and two 2.5″ SSDs for less than half the cost of the equivalent Dells and HPs, and they supported Enterprise drives that didn’t come with the Dell/HP tax.  Just be sure to add on the onsite warranty or carry spare parts.

You’re gonna need more space:  And not just disk space.  I ended up adding 8TB more disk space to my hosts to handle the high resolution cameras for the additional shipping tables added a year after the initial build.  Fortunately I had extra drive bays, but any more expansion will involve a larger tape changer and SAS expansion shelves for the hosts.

Cheaper can sometimes be better:  For a simple two-host Windows cluster, Starwind saved the company a good six figures.  It’s no Nimble, but it was fast, bulletproof and affordable.  And like I said before, Supermicro really saved the day on the D2D backup server.

A/C is the bane of every budget datacenter:  The SRCOOL12K I used did the job, but it was loud and inefficient.  I really should have pushed for the 12,000 BTU mini-split, even though it would have taken more time and money.

So is power:  I probably could have bought the modular Symmetra LX for what I paid for the three independent UPSes.  The independent units are less of a single point of failure than a monolith like the Symmetra, but I could have added enough power modules and batteries to the Symmetra to achieve my uptime goal and also power the A/C unit–something that the individual UPSes could not do.

SaaS all of the things:  Most of our apps were already in the cloud, but I implemented the PBX locally because it was quite a bit cheaper due to the number of extensions.  I’m now thoroughly convinced that in a small business, hosting your own PBX is only slightly less stupid than hosting your own Exchange Server.  Until you get to a thousand extensions and can afford to bring on a dedicated VoIP guy, let someone else deal with it.  Same goes for monitoring–I would have gladly gone with hosted Zabbix if it was available at the time.  Same with PagerDuty for alerting.

Expect your stuff to get thrown out:  My artisanally-crafted monitoring system went out the window when the MSP came in.  Same for my carefully locked down pfSense boxes.  Just expect that an MSP is going to have their own managed firewalls, remote support software, antivirus, etc.

Don’t take it personally:  Commercial pilots and railroad engineers describe the inevitable result of any government accident investigation: “They always blame the dead guy.”  That crude sentiment also applies to IT: no matter what happens after you leave, you’re going to get blamed for it.  After carefully documenting and training my replacement, I hadn’t even left when I started getting phone calls about outages, and they were basically all preventable.  The phone system was rebooted in the middle of the day.  A Windows Server 2003 box was shut down, even though it hosted the PICK application the owner still insisted on keeping around.  The firewalls were replaced without examining the existing rules first, plunging my monitoring system into darkness and causing phone calls to have one-way audio.  I answered calls and texts for two weeks, and then stopped worrying about them and focused solely on my present and future.

Write about it: Even if nobody reads your blog, outlining what you did and why, and what worked and what didn’t, will help you make better recommendations in the future.  And if someone does read it, it might help them as well.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.