HOWTO: Caching on the cheap

A project I’ve been working on lately is going to be featured in a Super Bowl commercial this weekend. It’s probably safe to assume that the unauthenticated landing pages will get the most traffic. There is some user generated content on these pages (read: database calls) that we want to display. This is how we cached these pages on the cheap:

Step 1: Generate the cache files:

When you run this script, it will loop over all of the PATHS, on the given BASE_URL, for the given PROTOCOLS. It will store the files in a cache directory, one level up from where the script is located. This script assumes that all paths that you defined in PATHS end in a ‘/’ and will cache a version for the full url with and without that ‘/’ (having two different files for these resources makes it easier to write our .htaccess rules next).

bin/gen_cache.sh

#!/bin/bash

PATHS="/ /mobile/ /fb/"
PROTOCOLS="http"
BASE_URL="://example.com"

#-- CD into the project directory
cd "`dirname $0`/.."
if [ ! -d "cache" ]; then
  mkdir "cache"
fi
cd "cache"

for path in $PATHS; do
  for protocol in $PROTOCOLS; do
    base="${path//\//}"
    case "$protocol" in
      "http")
        cache_suffix="_off_cached.html"
        ;;
      "https")
        cache_suffix="_on_cached.html"
        ;;
    esac
    out_file="${base}${cache_suffix}"

    #-- Fetch the resource
    curl "${protocol}${BASE_URL}${path}" > "${out_file}"
    if [ 0 -ne $? ]; then
      #-- If the page did not download correctly, do not cache it
      rm "${out_file}"
    else 
      #-- Only contiune if the page actually loaded
      #echo "" >> "${out_file}"

      #-- If the resource is a sub directory create the alternative name for it
      #   This is so we can support /my_resource AND /my_resource/
      if [ -n "${base}" ]; then
        if [ ! -d "${base}" ]; then
          mkdir "${base}"
        fi
        cp "${out_file}" "${base}/${cache_suffix}"
      fi
    fi
  done
done

Step 2: Serve the cache files (to everyone but our curl script above):

Now that we have our cache files generated, all we have to do is server them. We do however want to let our cache generating script to always access the original source, and so we are only going to apply this rule if the current browser is currently not curl. If that’s the case, we check to make sure that the cache file exits, and if it does, we serve that instead of the actual, uncached resource.

.htaccess

#-- Serve cached files to non Curl user agents
RewriteCond %{HTTP_USER_AGENT} !^curl [NC]
RewriteCond %{DOCUMENT_ROOT}/cache/%{REQUEST_URI}_%{HTTPS}_cached.html -f
RewriteRule ^(.*)$ cache/$1_%{HTTPS}_cached.html [L]

Boom! Add our bin/gen_cache.sh file to run every minute in our cronttab and we’ve got a super cheap cache!

Note: This is only for unauthenticated pages, and there are much better ways to cache resource, like varnish, but for something that you can configure in minutes and only requiring curl, mod_rewrite, and cron, this is a decent start.

Posted in Tech

HOWTO: Recover from a bad Upgrade from Ubuntu 10.04 to 12.04

While trying to migrate a server from Ubuntu 10.04 to 12.04 today, something went wrong. Upgrading to gcc-4.4 had unmet dependencies. While, trying to solve those, gcc wanted to be installed on a newer version of the kernel, but 1and1.com doesn’t let you upgrade your kernel and so I was left in 1/2 upgraded state with no way up.

It took a while to find a method to downgrade that worked, but then I finally found http://askubuntu.com/questions/49869/how-to-roll-back-ubuntu-to-a-previous-version. The key was to use /etc/apt/preferences to prefer packages at 10.04.

Posted in Linux, Tech

HOWTO: install add-apt-repository on Ubuntu 12.04

Many sources say that installing the ‘python-software-properties’ package would fix it, but I had to install ‘software-properties-common’

apt-get install software-properties-common

Done!

Posted in Linux, Tech