I like looking at visitor statistics on my personal blog, not because it really matters or affects anything… I just think it’s pretty interesting. I looked at some of the popular options like Google Analytics and some others, but they didn’t quite fit what I was looking for. I decided I had a few requirements:
Items #1 and #2 were usually mutually exclusive, except for some open source projects. Those projects generally seemed like more of a hassle to set up than what would be worth it. A lot of options also required including some extra javascript files, which I wanted to avoid too.
In the end, I settled on just using the logs generated by Apache web server and wrote a small ruby script to parse those log files.
The default Apache logs include a good amount of information, but you can also configure Apache to log some additional information that’s useful for site analytics such as the Referer
1 and User-Agent
header fields. You can find the Apache log configuration documentation here.
This is a snippet of my personal blog’s Apache config file with a custom log format:
# /etc/apache2/sites-available/caleb.software.conf
<VirtualHost *:80>
ServerName caleb.software
ErrorLog ${APACHE_LOG_DIR}/error.log
LogFormat "%t %h %U %>s %{Referer}i %{User-Agent}i" blog
CustomLog /var/log/site/custom.log blog
</VirtualHost>
First, we define a custom LogFormat
named “blog” and then set the CustomLog
file path and format. Here’s what each of the custom log flags mean:
%t
is the time of the request%h
is the requester’s IP (or hostname if you turn on reverse hostname lookups)%U
is the page requested (like “/index.html”)%>s
is the request status code (e.g. 200)%{xxx}i
are specific request header fieldsMake sure Apache has permission to write to the custom log file’s destination. I just created an empty file with touch custom.log
and then chown
‘d it.
I wrote an accompanying ruby script to process these log files and get some insights from them. Since my blog is a static site built with Jekyll, the ruby script just spits out a markdown file which gets built into the final site. The script is only about fifty lines long, and you can find it here on Github Gists.
The resulting markdown file is also pretty simple, with just a couple of tables listing the most popular pages (including total views and number of unique viewers) and the most frequent referrers. Unique viewers are determined by combining the User-Agent
string with the requester’s IP address… definitely not a perfect method, but good enough to give us a rough estimate.
Finally, the ruby script is run every hour by a simple crontab entry:
0 * * * * cd /var/www/site ; ruby analysis.rb && jekyll build
In the future, I have a few extra features I want to add to the script. It would be nice to see a chart of the number of viewers of each page over time. It would also be cool to see which posts result in the most email signups and engagement.
The english word is actually spelled “referrer”, but because of a funny typo back in the 90s the HTTP header is named “referer”. You can read more about it on Wikipedia. ↩