Exporting Data from Bugzilla to a Local Folder

I was recently tasked with decommissioning an old Bugzilla instance.  The company had long since moved to Jira’s SaaS offering and the old Bugzilla VM had been left to languish in read-only mode.  Now two full versions behind, the decision was made to attempt to export the Bugzilla data and import it into Jira rather than continue to update and support the aging software.

Unfortunately, there are not a lot of great options out there for moving data to an active Jira Cloud instance.  It’s trivial to connect a local Jira instance to Bugzilla and import all of its data.  However, that option is not available in Jira Cloud.  There are three options for getting Bugzilla data into Jira Cloud:

  1. Export the Bugzilla data as CSV and import it into Jira Cloud.  Attachments cannot be imported; they must be put on a separate web server and the URLs inserted into the CSV before import into Jira.  In our experience, notes were also not coming in, but due to the unpalatability of running a separate webserver for attachments, we didn’t pursue this further.
  2. Export the existing Jira Cloud instance and dump it into a local Jira instance.  Use the Jira import tool to merge in the Bugzilla data, and then export the local Jira instance and re-import it back into a Jira Cloud instance.  This would involve a substantial amount of work and downtime on a very busy Jira Cloud instance, and would involve a bit more risk than we were willing to take on.
  3. Export Bugzilla into a clean local Jira instance, and optionally then export it to a separate Cloud instance.  This would involve paying for an additional Jira Cloud instance, and it would be an extra system to manage and secure.

Because this is legacy data (regular use of Bugzilla ended several years ago), I was given the green light to export the roughly 30,500 bugs to searchable PDF files and store them on a shared local folder.  We would then be able to decommission the Bugzilla VM entirely.

Pressing CTRL-P 30,000 times

Bugzilla has a print preview mode for tickets.  It includes all of the data in the ticket including the header fields and date-stamped notes.  Fortunately, it turns out that Bugzilla, being an older “Web 2.0” product, will pull up any ticket by number in print preview mode straight from the URL.  The format is:

https://bugzilla.example.com/show_bug.cgi?format=multiple&id=2000

where ‘2000’ is the bug number. With that in mind, it was time to generate some PDFs.

After experimenting with Google Chrome’s ‘headless’ mode, I was able to write a quick batch script to iterate through the tickets and ‘print’ them to PDF. Here’s the code:

for /l %%i in (1001,1,2000) do (
  mkdir c:\bugs\bug%%i
  "c:\Program Files (x86)\Google\Chrome\Application\chrome.exe" --headless --disable-gpu --user-data-dir="c:\users\tim\appdata\local\google\chrome\user data" --ignore-certificate-errors --disable-print-preview --print-to-pdf="c:\bugs\bug%%i\bug%%i.pdf" "https://bugzilla.example.com/show_bug.cgi?format=multiple&id=%%i"
    )

This code is a bit messy since the verbosity of the third line makes it very long. The ‘for’ in the first line specifies the range of bug numbers to loop through. In this case, I’m printing bugs 1001 through 2000. The second line creates a folder for the bug. I put each bug in its own folder so that the attachments can be put in the same folder as their corresponding PDF.

The third line calls Google Chrome in ‘headless’ mode. I used the ‘–disable-gpu’ and ‘–disable-print-preview’ options to fix issues I was having running this in a Server 2016 VM. The ‘%%i’ is the bug number and is passed in the URL to bring up the bug in Bugzilla. Note that I used ‘–ignore-certificate-errors’ because the cert had expired on the Bugzilla instance (ignore if not using TLS.) Possibly due to the version of Bugzilla we were running, headless Chrome would lose its cookie after a few connections. I resolved this by temporarily turning on anonymous access in Bugzilla so the bugs could be viewed without having to log in. (This was an internal system, so there was no risk having this open for a couple of hours.)

While I could easily just plug in ‘1’ and ‘30500’ into the ‘for’ loop and let it run for a few days, this batch file was taxing only one core on my local system and barely registering at all on the Bugzilla host. Since I had eight cores available, I duplicated the batch file and ran ten copies simultaneously, each pulling down 3,000 bugs. This allowed the local system to run at full load and convert more than 30,000 bugs to PDF overnight.

Attach, mate.

Our version of Bugzilla stores its attachments in a table in the database and would have to be extracted. Fortunately, Stack Overflow came to the rescue. The below SQL script parses the attachments table and generates another SQL file that extracts the attachments one at a time.

use bugs;
select concat('SELECT ad.thedata into DUMPFILE  \'/bugs/bug'
, a.bug_id
, '/bug'
, a.bug_id 
, '___'
, ad.id
, '___'
, replace(a.filename,'\'','')
, '\'  FROM bugs.attachments a, bugs.attach_data ad where ad.id = a.attach_id'
, ' and ad.id = '
, ad.id
,';') into outfile '/bugs/attachments.sql'
from bugs.attachments a, bugs.attach_data ad where ad.id = a.attach_id;

I created a temporary directory at /bugs and made sure MariaDB had permissions to write to the directory. I saved this as ‘parse.sql’ in that directory and then ran it with:

mysql -uroot -ppassword < /bugs/parse.sql

After looking at the freshly created /bugs/attachments.sql to verify that paths and filenames looked good, I edited it to insert a ‘use bugs;’ line at the top of the file. Then I created a quick shell script to create my 30,500 directories to match the directories created by my PDF script above:

#!/bin/bash
for i in {1..30500}
 do
   mkdir bug$i
 done

After running that script, I verified that all my directories were created and gave write permission to them to the ‘mysql’ user. It was then time to run my attachments.sql file:

mysql -uroot -ppassword < /bugs/attachments.sql

A quick ‘du –si’ in the /bugs folder verified that there were indeed files in some of the folders. After confirming that the attachments were named correctly and corresponded to the numbered folder they were in, I used ‘find’ to prune any empty directories. This isn’t strictly necessary, but it means fewer folders to parse later.

cd /bugs
find . -type d -empty -delete

Putting it all together

The final step in this process was to merge the attachments with their corresponding PDF files. Because I used the same directory structure for both, I could now use FileZilla to transfer and merge the folders. I connected it to the VM over SFTP, highlighted the 30,500 folders containing the attachments, and dragged them over to the folder containing the 30,500 folders created during the PDF processing. FileZilla dutifully merged all of the attachments into the PDF folders. Once completed, I spot-checked to verify that everything looked good.

I wouldn’t have used this technique for recent or active data, but for old legacy bugs that are viewed a couple of times a year, it worked. The files are now stored on a limited-access read-only file share and can be searched easily for keywords. It also allowed us to remove one more legacy system from the books.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.