Publish WordPress to static GitLab Pages site

Mon August 06 2018 by Christopher Aedo

A long time ago I set up a WordPress blog for a family member. Though there are lots of other options these days, back then there were few decent choices if your requirement was a web-based CMS with a WYSIWYG editor. An unfortunate side effect of things working well was that quite a lot of content for that blog has been generated over time. That means I've also been in the business of regularly updating WordPress to protect against the exploits that are always popping up.

Recently I wanted to convince the family member that switching to Hugo would be relatively easy, and the blog could then be hosted on GitLab (just like this one!) Trying to extract all that content and convert it to markdown turned into a huge hassle. There were some automated scripts that got me 95% there but nothing worked perfectly. Manually updating all the posts was not something I wanted to do, so eventually I gave up the dream of moving that blog.

Recently I started thinking about this again, and realized there was a solution I hadn't considered. I could continue maintaining the WordPress server but set it up to publish a static mirror and serve that with GitLab Pages (or Github Pages if you like). This would allow me to automate LetsEncrypt certificate renewals as well as eliminating the security concerns associated with hosting a WordPress site. This WOULD however mean comments would stop working, but that feels like a minor loss in this case because the blog did not garner many comments.

Here's the solution I came up with and so far it seems to be working pretty well.

  • Host WordPress site at URL that is not linked to from anywhere else to reduce the odds of it being exploited - in this example we'll use http://private.localconspiracy.com (even though this site is actually built with Pelican)
  • Set up hosting on GitLab Pages for the public URL, https://localconspiracy.com
  • Add a cron job that determines when the last-built date differs between the two URLs - if the build-dates differ, mirror the WordPress version
  • After mirroring with wget, update all links from "private" version to "public" version
  • Do a git push to publish the new content

These are the two scripts I use:

check-diff.sh (called by cron every 15 minutes)

#!/bin/bash

ORIGINDATE="$(curl -v --silent http://private.localconspiracy.com/feed/ 2>&1|grep lastBuildDate)"
PUBDATE="$(curl -v --silent https://www.localconspiracy.com/feed/ 2>&1|grep lastBuildDate)"

if [ "$ORIGINDATE" !=  "$PUBDATE" ]
then
  /home/doc/repos/localconspiracy/mirror.sh
fi

mirror.sh:

#!/bin/sh

cd /home/doc/repos/localconspiracy

wget \
--mirror \
--convert-links  \
--adjust-extension \
--page-requisites  \
--retry-connrefused  \
--exclude-directories=comments \
--execute robots=off \
http://private.localconspiracy.com

git rm -rf public/*
mv private.localconspiracy.com/* public/.
rmdir private.localconspiracy.com
find ./public/ -type f -exec sed -i -e 's|http://private.localconspiracy|https://www.localconspiracy|g' {} \;
find ./public/ -type f -exec sed -i -e 's|http://www.localconspiracy|https://www.localconspiracy|g' {} \;
git add public/*
git commit -m "new snapshot"
git push origin master

That's it! Now when the blog is changed, within 15 minutes the site will be mirrored to a static version and then pushed up to the repo where it will be reflected in GitLab pages.

This concept could be extended a little further if you wanted to run WordPress locally. In that case you would not need a server to host your WordPress blog, you could just run it on your local machine. In that scenario there's no chance of your blog getting exploited. As long as you can run wget against it locally you could use the same approach outlined above to have a WordPress site hosted on GitLab Pages.