Make Go do PHP things

2023-06-14

Or, how to render static HTML files from database dump of PHP blogging system.

Background

NucleusCMS is actually one of the first open source projects I participated in personally, back in the dawn of Web 2.0 and LAMP (Linux+Apache+MySQL+PHP, anyone still remembers it?) days.

I picked it over WordPress back in the days because it natively supports multiblogs from the beginning (1.0 release), but also discovered a lot of cool features (like revisions) along the way.

But WordPress “won” and NucleusCMS was relying on enthusiastic developers and users, which eventually led to its shutdown in 2014.

Although it was later “brought back” in 2016 to support PHP7, PHP8 was released in 2020 and Nucleus no longer supports it. To keep it running I have to keep a PHP7 running on the web server.

Now with Debian Bookworm released over the last weekend, Debian Bullseye, the last version comes with PHP7, is now oldstable. So really keeping a PHP7 running on the website becomes a bigger and bigger hassle.

I already switched to this blog system almost a year ago, but had to keep PHP7 running for my old blog and posts. I promised at the time of switching to the new system to keep the old posts in a read-only, static rendering. I’m now finally fulfilling that promise.

How

So Nucleus stores blog posts in MySQL database, as long as I can read them I can script the rendering of them.

But instead of talking directly to the database, there’s actually an easier way: read the database dump.

MySQL has a handy tool called mysqldump, and I actually have a cronjob to make dumps daily for backup purposes. Those dumps are text files with huge INSERT INTO lines (1 or 2 INSERT INTO lines per table), so I can just get the data out of those lines.

To make things easier for me, the code requires you to run mysqldump with --complete-insert arg so the column names are in the same INSERT INTO lines.

Here is the first interesting thing in this project: I initially used go’s bufio.Scanner to read the file, but actually hit the line limit of it, and had to switch to use bufio.Reader instead.

Another interesting thing here is string unescaping. The strings stored in the dump file are inside single quotation marks, with both single and double quotation marks, and some other characters, escaped. Like this:

'It\'s time to say \"Hello\"\r\nAnd a new line.'

At first I tried to use strconv.UnquoteChar with single quotation mark as the quote arg, but the problem is that when using single quotation mark as the quote arg that’s the only thing allowed to be escaped, and \" will cause an error. In the end I ended up doing the unescaping of \' manually, then feed the entire string into strconv.Unquote to do the final unescaping.

After reading the important data out of the DB dump, the next step is to render them. For that I used go’s html template package. One thing to note here is that the blog content in the DB is already in HTML, but go’s html template will do a lot of sanitization for safety reasons. At first I tried to use the same trick PandaBlog code used (to put html code rendered from markdown code into the final html page), but discovered that I can use the HTML type to disable sanitization. This leads me to be able to simplify the render of footer markdown in PandaBlog.

In the end, the code renders html files for each blog post (without comments), plus an index html page (my main motivation here is just to keep the old permalinks working, so I didn’t bother with any of the category or monthly archive pages). In order to keep the old permalinks work, I added this part into Nginx config:

        location /blog/item {
                # To keep old permalinks work
                try_files $uri $uri.html =404;
        }

You can see the results here.

#English #go #php #nucleuscms