Exporting Data from Bugzilla to a Local Folder

I was recently tasked with decommissioning an old Bugzilla instance.  The company had long since moved to Jira’s SaaS offering and the old Bugzilla VM had been left to languish in read-only mode.  Now two full versions behind, the decision was made to attempt to export the Bugzilla data and import it into Jira rather than continue to update and support the aging software.

Unfortunately, there are not a lot of great options out there for moving data to an active Jira Cloud instance.  It’s trivial to connect a local Jira instance to Bugzilla and import all of its data.  However, that option is not available in Jira Cloud.  There are three options for getting Bugzilla data into Jira Cloud:

  1. Export the Bugzilla data as CSV and import it into Jira Cloud.  Attachments cannot be imported; they must be put on a separate web server and the URLs inserted into the CSV before import into Jira.  In our experience, notes were also not coming in, but due to the unpalatability of running a separate webserver for attachments, we didn’t pursue this further.
  2. Export the existing Jira Cloud instance and dump it into a local Jira instance.  Use the Jira import tool to merge in the Bugzilla data, and then export the local Jira instance and re-import it back into a Jira Cloud instance.  This would involve a substantial amount of work and downtime on a very busy Jira Cloud instance, and would involve a bit more risk than we were willing to take on.
  3. Export Bugzilla into a clean local Jira instance, and optionally then export it to a separate Cloud instance.  This would involve paying for an additional Jira Cloud instance, and it would be an extra system to manage and secure.

Because this is legacy data (regular use of Bugzilla ended several years ago), I was given the green light to export the roughly 30,500 bugs to searchable PDF files and store them on a shared local folder.  We would then be able to decommission the Bugzilla VM entirely.

Pressing CTRL-P 30,000 times

Bugzilla has a print preview mode for tickets.  It includes all of the data in the ticket including the header fields and date-stamped notes.  Fortunately, it turns out that Bugzilla, being an older “Web 2.0” product, will pull up any ticket by number in print preview mode straight from the URL.  The format is:

https://bugzilla.example.com/show_bug.cgi?format=multiple&id=2000

where ‘2000’ is the bug number. With that in mind, it was time to generate some PDFs.

After experimenting with Google Chrome’s ‘headless’ mode, I was able to write a quick batch script to iterate through the tickets and ‘print’ them to PDF. Here’s the code:

for /l %%i in (1001,1,2000) do (
  mkdir c:\bugs\bug%%i
  "c:\Program Files (x86)\Google\Chrome\Application\chrome.exe" --headless --disable-gpu --user-data-dir="c:\users\tim\appdata\local\google\chrome\user data" --ignore-certificate-errors --disable-print-preview --print-to-pdf="c:\bugs\bug%%i\bug%%i.pdf" "https://bugzilla.example.com/show_bug.cgi?format=multiple&id=%%i"
    )

This code is a bit messy since the verbosity of the third line makes it very long. The ‘for’ in the first line specifies the range of bug numbers to loop through. In this case, I’m printing bugs 1001 through 2000. The second line creates a folder for the bug. I put each bug in its own folder so that the attachments can be put in the same folder as their corresponding PDF.

The third line calls Google Chrome in ‘headless’ mode. I used the ‘–disable-gpu’ and ‘–disable-print-preview’ options to fix issues I was having running this in a Server 2016 VM. The ‘%%i’ is the bug number and is passed in the URL to bring up the bug in Bugzilla. Note that I used ‘–ignore-certificate-errors’ because the cert had expired on the Bugzilla instance (ignore if not using TLS.) Possibly due to the version of Bugzilla we were running, headless Chrome would lose its cookie after a few connections. I resolved this by temporarily turning on anonymous access in Bugzilla so the bugs could be viewed without having to log in. (This was an internal system, so there was no risk having this open for a couple of hours.)

While I could easily just plug in ‘1’ and ‘30500’ into the ‘for’ loop and let it run for a few days, this batch file was taxing only one core on my local system and barely registering at all on the Bugzilla host. Since I had eight cores available, I duplicated the batch file and ran ten copies simultaneously, each pulling down 3,000 bugs. This allowed the local system to run at full load and convert more than 30,000 bugs to PDF overnight.

Attach, mate.

Our version of Bugzilla stores its attachments in a table in the database and would have to be extracted. Fortunately, Stack Overflow came to the rescue. The below SQL script parses the attachments table and generates another SQL file that extracts the attachments one at a time.

use bugs;
select concat('SELECT ad.thedata into DUMPFILE  \'/bugs/bug'
, a.bug_id
, '/bug'
, a.bug_id 
, '___'
, ad.id
, '___'
, replace(a.filename,'\'','')
, '\'  FROM bugs.attachments a, bugs.attach_data ad where ad.id = a.attach_id'
, ' and ad.id = '
, ad.id
,';') into outfile '/bugs/attachments.sql'
from bugs.attachments a, bugs.attach_data ad where ad.id = a.attach_id;

I created a temporary directory at /bugs and made sure MariaDB had permissions to write to the directory. I saved this as ‘parse.sql’ in that directory and then ran it with:

mysql -uroot -ppassword < /bugs/parse.sql

After looking at the freshly created /bugs/attachments.sql to verify that paths and filenames looked good, I edited it to insert a ‘use bugs;’ line at the top of the file. Then I created a quick shell script to create my 30,500 directories to match the directories created by my PDF script above:

#!/bin/bash
for i in {1..30500}
 do
   mkdir bug$i
 done

After running that script, I verified that all my directories were created and gave write permission to them to the ‘mysql’ user. It was then time to run my attachments.sql file:

mysql -uroot -ppassword < /bugs/attachments.sql

A quick ‘du –si’ in the /bugs folder verified that there were indeed files in some of the folders. After confirming that the attachments were named correctly and corresponded to the numbered folder they were in, I used ‘find’ to prune any empty directories. This isn’t strictly necessary, but it means fewer folders to parse later.

cd /bugs
find . -type d -empty -delete

Putting it all together

The final step in this process was to merge the attachments with their corresponding PDF files. Because I used the same directory structure for both, I could now use FileZilla to transfer and merge the folders. I connected it to the VM over SFTP, highlighted the 30,500 folders containing the attachments, and dragged them over to the folder containing the 30,500 folders created during the PDF processing. FileZilla dutifully merged all of the attachments into the PDF folders. Once completed, I spot-checked to verify that everything looked good.

I wouldn’t have used this technique for recent or active data, but for old legacy bugs that are viewed a couple of times a year, it worked. The files are now stored on a limited-access read-only file share and can be searched easily for keywords. It also allowed us to remove one more legacy system from the books.

The Littlest Datacenter Part 6: Lessons Learned

For the first post of this long saga, click here.

It’s been a year since I moved on from the company running on the Littlest Datacenter, and about two years since it was built.  As I mentioned, I built it to be as self-sufficient, flat, simple and maintainable as possible, first because I had duties beyond being the IT guy and dropping everything to hack on junk equipment wasn’t going to cut it; second because I was the only IT guy and I wanted to be able to take vacations and sleep through the night without the business falling apart; and third, because I knew that, regardless of whether I stayed with that company or not, the IT function would eventually be given to an MSP or a junior admin.

Looking back at the setup, here are some lessons learned:

Buy Supermicro carefully:  The default support Supermicro offers is depot repair.  That means you’re deracking your server, boxing it up and paying to ship it back to them for repair.  Repair can take anywhere from one to six weeks.  This sucks because Supermicro offers a lot of flexible and reliable hardware choices for systems that fall outside the mainstream.  For instance, my Veeam server fit sixteen 3.5″ hard drives and two 2.5″ SSDs for less than half the cost of the equivalent Dells and HPs, and they supported Enterprise drives that didn’t come with the Dell/HP tax.  Just be sure to add on the onsite warranty or carry spare parts.

You’re gonna need more space:  And not just disk space.  I ended up adding 8TB more disk space to my hosts to handle the high resolution cameras for the additional shipping tables added a year after the initial build.  Fortunately I had extra drive bays, but any more expansion will involve a larger tape changer and SAS expansion shelves for the hosts.

Cheaper can sometimes be better:  For a simple two-host Windows cluster, Starwind saved the company a good six figures.  It’s no Nimble, but it was fast, bulletproof and affordable.  And like I said before, Supermicro really saved the day on the D2D backup server.

A/C is the bane of every budget datacenter:  The SRCOOL12K I used did the job, but it was loud and inefficient.  I really should have pushed for the 12,000 BTU mini-split, even though it would have taken more time and money.

So is power:  I probably could have bought the modular Symmetra LX for what I paid for the three independent UPSes.  The independent units are less of a single point of failure than a monolith like the Symmetra, but I could have added enough power modules and batteries to the Symmetra to achieve my uptime goal and also power the A/C unit–something that the individual UPSes could not do.

SaaS all of the things:  Most of our apps were already in the cloud, but I implemented the PBX locally because it was quite a bit cheaper due to the number of extensions.  I’m now thoroughly convinced that in a small business, hosting your own PBX is only slightly less stupid than hosting your own Exchange Server.  Until you get to a thousand extensions and can afford to bring on a dedicated VoIP guy, let someone else deal with it.  Same goes for monitoring–I would have gladly gone with hosted Zabbix if it was available at the time.  Same with PagerDuty for alerting.

Expect your stuff to get thrown out:  My artisanally-crafted monitoring system went out the window when the MSP came in.  Same for my carefully locked down pfSense boxes.  Just expect that an MSP is going to have their own managed firewalls, remote support software, antivirus, etc.

Don’t take it personally:  Commercial pilots and railroad engineers describe the inevitable result of any government accident investigation: “They always blame the dead guy.”  That crude sentiment also applies to IT: no matter what happens after you leave, you’re going to get blamed for it.  After carefully documenting and training my replacement, I hadn’t even left when I started getting phone calls about outages, and they were basically all preventable.  The phone system was rebooted in the middle of the day.  A Windows Server 2003 box was shut down, even though it hosted the PICK application the owner still insisted on keeping around.  The firewalls were replaced without examining the existing rules first, plunging my monitoring system into darkness and causing phone calls to have one-way audio.  I answered calls and texts for two weeks, and then stopped worrying about them and focused solely on my present and future.

Write about it: Even if nobody reads your blog, outlining what you did and why, and what worked and what didn’t, will help you make better recommendations in the future.  And if someone does read it, it might help them as well.

 

The End of an Era?

I was heartbroken to read about the demise of Weird Stuff Warehouse, a Silicon Valley institution.

I remember when they were just called Weird Stuff and were located in a commercial storefront near Fry’s in Milpitas.  They had glass display cases with a few dozen parts for sale, such as hard drives and peripheral cards. Once in a while, they would have something crazy, like a giant minicomputer hard drive with a spindle motor that looked like it belonged in a washing machine.  We mostly visited just to see what was new, though I do remember when they had trash cans full of ping pong balls that they were selling by the bagful.  We bought a few dozen to throw at each other at the office.

Imagine my surprise, then, when I was assigned a weeklong project in Milpitas a few years ago, a good decade after I had moved to SoCal.  My GPS took me to a nondescript warehouse entrance.  When I walked inside, it was like a massive museum.  Stack after stack of 30-year-old hard drives, cards, motherboards, power supplies, test equipment, industrial equipment, cables, wires, displays, servers, switches, cabinets, modem banks… I spent every evening after work walking up and down the aisles, admiring and sometimes touching the Silicon Valley of my youth.

With my (and my 17-year-old son’s) excitement building about the upcoming Vintage Computer Festival West in Mountain View this summer, I Googled Weird Stuff so that my son, too, could experience the fruits of Silicon Valley on those shelves.  Alas, it turns out that Googling was what contributed to the death of this institution.  The search giant bought the building, and Weird Stuff Warehouse closed its doors and sold its inventory to a company that, as far as I can tell, doesn’t have a retail presence.

In the light of recent events involving Facebook, Uber and other companies, there’s a growing sentiment that Silicon Valley is not what it used to be.  I can’t speak to that myself; I moved out of the Bay Area almost two decades ago and haven’t followed it as closely as I used to.  But it seems that Silicon Valley, which used to be about inventing and building better stuff (hence the “silicon” in the name) has forgotten its roots a bit in its bid to grab some of that VC gold rush money.  Perhaps Silicon Valley needs to get back to building more weird stuff instead.

The Motivation

First, a confession:  this is not my first attempt at a blog.

My previous attempts typically died after a post or two.  What makes things different this time?  I heard a podcast a couple of months ago where the guests listed reasons why I should consider doing this.  Improving my writing skills, sharing tips and tricks I’ve found during the course of my work, networking with peers, and having a public body of work were all reasons that resonated with me.

So, what should readers expect?  My goal is to post something at least weekly.  With my upcoming VMware VCP training, lots of vintage equipment finding its way into my house and plenty of excitement at the office, I should be able to post with some regularity.  I expect to eventually involve YouTube at some point, but I’m going to crawl before I try to walk.  I did fire up a Twitter Account for the stuff that’s better suited to short form.

Incidentally, I’m not getting paid to say this, but it’s easy to find WordPress.com offer codes on various podcast networks right now, and it’s awesome having somebody else to the heavy lifting for less than four bucks a month.  Having had to maintain and patch WordPress sites in the past, I like being able to focus on what I want to write and not whether the next WP patch is going to nuke a custom theme.