Signing and Updating the Blog on S3

One drawback to using a static site is that when you make changes to things that are shared between pages all those pages have to be updated. And since I’m also GPG signing the pages one change can end up touching a couple hundred pages. With the move to Hugo I’ve also been updating things more often that I normally do. I had forgotten about the CloudFront limit on invalidation requests (first 1000 a month are free. $0.005 each after that).

This blog is hosted on Amazon S3 storage using their website feature. I have CloudFront in front of that to handle the domain names and the SSL certificates (which are now provided by Amazon). But this means that when I change something on the blog it can take some time to propagate to the various CDNs used by CloudFront. So I have been using --cf-invalidate with the s3cmd sync command. This submits invalidations for all the changed pages so that they show up quickly.

I first noticed something when I got an alert from AWS that my bill was over $10 this month. It was obvious from looking at the billing dashboard that my invalidation count had gone over the limit. An extra 1535 URLs had cost me $7.68

In order to limit the number of changes per-post I’ve dropped the ‘recent posts’ sidebar, it’s kind of redundant since you can see the recent posts by going to the main page anyway. That keeps new posts from changing all the previous posts. I also dropped the --cf-invalidate from the s3cmd sync and will be trying to use a manual invalidation of /* in the CloudFront web UI. According to this document that should only count as 1 request instead of the individual pages.

Here’s the script I’m currently using. It regenerates the site in an empty ./public directory but keeps copies of the previous run in ./pre-signed and a copy of the final output in ./signed-output. By using rsync between the 2 unsigned directories I am able to determine which pages need to be re-signed and added to ./signed-output. All non-html files get synced directly from ./public to ./signed-output

Now to see if a manual /* invalidation makes my URL count increment to 1536 or not…

#!/usr/bin/bash

# Where to push the changes using s3cmd
S3_BUCKET=yourbucketname

HUGO_DIR=./hugo/
# These are all under $HUGO_DIR
# BEWARE - $PUBLIC_DIR is removed before regenerating the site
PUBLIC_DIR=./public/
PRE_SIGNED=./pre-signed/
SIGNED_OUTPUT=./signed-output/

if [[ "$1" == "push" ]]; then
    PUSH=1
else
    PUSH=0
fi

# GnuPG sign a page
sign_page() {
    echo "GPG Signing $1"
    DIRNAME=$(dirname "$1")
    mkdir -p "$SIGNED_OUTPUT$DIRNAME"

    # Add the comments so they get signed
    printf "\n-->\n" > "$SIGNED_OUTPUT$1"
    cat "$PRE_SIGNED/$1" >> "$SIGNED_OUTPUT$1"
    printf "\n<!--\n" >> "$SIGNED_OUTPUT$1"

    # Sign it
    gpg2 -a --clearsign "$SIGNED_OUTPUT$1" || exit 1

    # Add the rest of the comments, outside the signature
    printf "\n<!--\n" > "$SIGNED_OUTPUT$1"
    cat "$SIGNED_OUTPUT$1.asc" >> "$SIGNED_OUTPUT$1"
    printf "\n-->\n" >> "$SIGNED_OUTPUT$1"

    rm "$SIGNED_OUTPUT$1.asc"
}


# Regenerate the website
cd $HUGO_DIR || exit 1
rm -rf $PUBLIC_DIR
hugo || exit 1

# Sync the non-html files to the final output directory, also delete removed non-html files
rsync -a --checksum --delete --exclude '*.html' $PUBLIC_DIR $SIGNED_OUTPUT || exit 1

# Get a list of the html files that have changed since the last update
HTML_FILES=$(rsync -a --checksum --delete --info=name1 $PUBLIC_DIR $PRE_SIGNED | grep -E '.html$')

# If any html files change, sign them and put them into $SIGNED_OUTPUT
for f in $HTML_FILES; do
    sign_page "$f"
done

# Check to make sure all files in $PUBLIC_DIR are also in $SIGNED_OUTPUT
if ! diff -q <(find "$PUBLIC_DIR" -printf "%P\n" | sort) <(find "$SIGNED_OUTPUT" -printf "%P\n" | sort); then
    echo "There are files missing/unexpected in $SIGNED_OUTPUT"
    exit 1
fi

if [[ $PUSH -eq 1 ]]; then
    echo "Pushing changes to S3"
    s3cmd sync -MP --no-mime-magic --default-mime-type=text/html --exclude="*.swp" --delete-removed $SIGNED_OUTPUT s3://$S3_BUCKET/

    # css mime type is always getting screwed up (even with this, cf-invalidate doesn't seem to do anything)
    s3cmd modify -P -m text/css s3://$S3_BUCKET/css/*
fi

cd ..