Tracking Your Users In The Access Logs
Posted on 25 Nov 2003
by Philip Hoyt (calimehtar)
Rated 3.5 (Ratings: 4)
- More articles in Backend
Most server log analysis applications on the market simply present usage information grouped by date with sub-groupings like daily averages and top downloads by file size. You can see trends this way like the spike in traffic after you send out an email, or whether you're getting most of your hits from users at work at 3 PM or at home after work, and basic information about where your users are coming from and how much data they are downloading. While this can be useful, it doesn't begin to touch the range of information available to be gleaned from the logs with a little creativity.
Server access logs, while limited in their flexibility, are the best source available for real, hard statistics on what your users are doing. No expensive usability tests are a more accurate portrayal of usage patterns since server logs are not affected by laboratory conditions or limitations on the sample size.
Note also that this article is not an introduction to server logs. For that you might want to try this Evolt article by Marlene Bruce .
Where do your users live?
Host information can be used to extrapolate what region users are accessing your site from. Most log stats systems categorize users by their top-level domain (TLD) only - .com (often categorized as 'US Commercial') .net (labeled with the largely meaningless tag 'network'). Only those users whose ISPs have a TLD indicating nationality will be categorized in a meaningful way. Canadians will recognize what a problem this is - for example the dominant ISP in Western Canada is Telusplanet.net. This piece of trivia is relatively easy to come by but a web-server stats system with default configuration will likely report a Telusplanet hit as coming from 'network'.
A recent analysis of a client's server logs for example, categorizing ISPs by region wherever possible, revealed that nearly as many users were hitting this particular site from Ontario alone as came from the USA as a whole. This is information that could not be derived without altering the presets on my log analysis tool.
Analog is a popular log analysis application and is configurable enough to let you easily sort your users by ISP using a syntax like
HOSTALIAS *.cgocable.net "Ontario.ca"HOSTALIAS *.charter.com "USA.com"
The .ca or .com suffix is necessary because Analog treats HOSTALIAS as in IP address and therefore will only accept strings that are formed like IP addresses.
Where did your users find your site?
Referers are incredibly useful. Probably the most obvious use is to find sites that link to the one whose logs you are studying. This will turn up influential link pages on the internet, blogs whose author shares an interest in your site, and sometimes forum discussions. It pays to know the internet community that brings you traffic - this can help determine the motivations of your users and cater to them better as well as to help in search engine optimization since. For example, Google also likes interlinking - it will help your google rank if you link back to some of the people that are already linking to you.
The ability to view external site referers is well-supported by popular log analysis systems which hyperlink listings so that you can view the referring pages with a single click.
What are your users searching for?
While searches that fail to satisfy a users request obviously won't turn up in the logs, you can learn a lot from those that succeed. You may be surprised what keyword searches lead people to your site, and you can use this information as well as results from less successful searches to reorganize content to optimize searches and learn what people who visit your site may be looking for. Analysis of the logs on a recent project, for example, revealed that an inordinate number of searches were resulting in pdfs which revealed nothing about the site they were hosted by and provided no links back to the site.
On the other hand, no log analysis application I have used has an adequate system for viewing the referring searches first-hand. You might want to see complete referring url for searches including parameters and the particular url of the search engine used, so you can perform the search for yourself and see why, for example, a plausible search for a European villa rental service like "European villa rental" isn't bringing your site traffic - there are a million other sites higher ranked than it in this search - while a seemingly less intuitive search like "culinary chateau" is ranked 8th most popular search phrase - it's on the second page of results.
In order to simplify the presentation server statistics all systems I have used including Analog and Webalizer leave the parameters and urls off, displaying search queries as plain text and preventing you from seeing your own search results ranking first-hand. This is a major handicap and the only work-around I'm aware of is to view the actual server logs, find the relevant information manually (using grep on Mac OS X or Linux), and paste the referring URLs into your browser.
Which internal links are people following?
In the case of internal pages you can observe which links are most used to access certain types of information and which pages may not be getting the attention they deserve, again by tracking referers. One log analysis application I found, Wusage, creates an ingenious, if ugly, visual navigation map in pdf form which allows you to see popular documents and the most common link paths in one view using a simple tree diagram.
Extending basic stats functionality with redirects
By adding a redirect page, you can track how many users are following a link from your site out to one particular document on the web. For example on a site that has two options for selling books - by downloading an order-form pdf or by following a link to Amazon - the click-throughs from the author's site can be tracked by making linking to a blank page with a redirect rather than directly to Amazon.
Tracking links from a newsletter back to your site - observing these hits independently from hits that come from Google or elsewhere, as well as tracking hits from different issues of your newsletter - can be facilitated by adding parameters to the url which don't have to be processed by an application server in order to be tracked by the server logs.
Changing http://www.MyDomain.com to http://www.MyDomain.com?issue=12 will not affect the display of the page unless you want it to, but will allow you to track the success of each consecutive newsletter issue separately without any additional work.
Track user habits and browser settings with response codes
The variety of information which can be gleaned from server response codes may be as deep as the number of possible uses there are for such codes as "303 See Other" or "403 Forbidden". See this page from the W3 Consortium's HTTP specification for the complete list of server codes.
"304 Not Modified" can be used to determine how often external css and js files are being cached (about two thirds of the time on one site) compared with how often users had to down load them (returning "200 OK") to determine how much bandwidth is conserved by moving style information off html files into external files.
Given that unique visitors to the site will not have cached versions of files on your site and that these cached files will expire eventually, "304 Not Modified" could also be used to measure fluctuations in the numbers of returning vs. unique visitors, if not the absolute figures. Unfortunately ache expiry depends on user settings, and statistics about these settings can not easily be found.
Server logs are there to be used and contain real information about real users on your site. There is no more reliable source for information on user patterns, though there are many tools that are more flexible. My one misgiving is that the log analysis tools out there today are so consistently mediocre that they will frustrate many attempts to study the data more deeply. I hope that these ideas will help provide the impetus for these features to be added to existing log analysis tools.