A Brief Introduction To Server Logs
Posted on 11 Jul 1999
by Marlene Bruce (marlene)
Rated 4.07 (Ratings: 2)
- More articles in Backend
Q. What is a server log?
- Which pages of your web site were viewed, and how often,
- How many visits your web site received,
- Which sections/pages of your site were most popular during a given period of time (say a week or a month),
- Where your visitors came from (in some cases),
- What browsers and platforms (operating systems) they were using,
- What times of the day and days of the week your server is busiest,
- What keyword searches lead people to your site,
- What errors people are encountering,
- On which pages people are most frequently entering your site,
- How long the average "view time" was for a given web page,
- Who your most frequent visitors are (in some cases),
- and possibly more.
A. The number of "hits" a site receives can be a misleading way to judge its popularity. It is important that you understand each of the following terms:
- Hit: A hit is counted when any file is accessed during a user session, whether it is an HTML page, a graphic, or another file type. In other words, accessing an HTML page with five graphics will count as six hits. Also, if a user reloads a page during a session, it is counted as a fresh hit (or hits) (likewise when a user restarts their browser and then revisits a page).
- Page View: A view is counted when an HTML file is accessed during a user session, independent of graphics on the page. In other words, accessing an HTML page with five graphics will count as one view. For each visit to a given HTML page during a user session, it is counted as one view for the entire session (regardless of the actual times the HTML page was viewed by that user during that session). Also, if a user reloads a page during a session, it is counted as a fresh view (likewise when a user restarts their browser and then revisits a page).
- User Sessions: A session of activity (all hits and views) for one visitor to a web site. A unique user is determined by the IP address. A user session is terminated when a user is inactive for more than 30 minutes (depending on your hardware/software), or when the user quits their browser.
- Home Page Views: Number of requested views of the home page during a given period. If a user visits the home page during their user session, it is counted as one view for the entire session (regardless of the actual times the home page was viewed by that user during that session).
- Entire Site Hits: The number of hits to the entire site for the given period, including all graphics, HTML files, and other file types (such as PDF).
A. You must first find out if your server is dedicated to your web site, or if it is a multi-homed domain (hosts other web sites with other "root" web addresses -- or www.yourname.com vs. www.theirname.com). The answer will determine whether all the logs you'll find on the server represent your web site (you're in luck), or might contain logs for multiple sites (your logs _might_ be mixed in). Also, if your web site is part of a larger whole, your logs are likely to be interspersed with those of the whole site.
If you don't run the server yourself (and/or know the answer), you need toask the person running the server (through your ISP, employer etc.) about the state of your logs. In the worst case scenario - you have no server manager to ask - you should be able to tell by looking at the logs themselves whether your logs are separate or mixed with other sites.
Q. Okay, I've determined my situation (or will once I see the logs). Whatnext?
A. Next you'll have to find the actual folder the logs reside in on the server. While I can't tell you the exact location (every server has the possibility of being organized differently), I can suggest the following strategies:
- Ask your ISP or server manager for log access and location, or
- Do a search/find on your server for folders/files called "log" or "logs". Look for a folder first, as it may contain files also called "log" or "logs".
You may have to look around a bit to find them, but once you do the folderis likely to contain dozens or even hundreds of files. The files are typically organized by date. For example, our logs for the beginning of July 1999 are called:
Yours may be called something entirely different. When you open the file, youwill see a collection of individual logs for that date, one per line, containing all the data you need. An individual log file will look something like:
10453, 328, 5354, 200, 0, GET, /users/CMREC/2-6ART4.HTM, Mozilla/4.03 [en] (Win95; I), http://www.lycos.com/cgi- bin/pursuit?query=grub+control&cat=dir, -, -,
a-xix.wincom.net, -, 7/11/99, 9:12:03, W3SVC, AGNR, www.agnr.umd.edu,
Q. Some of that looks familiar, but what does it all mean?
There are seven basic (or common) fields in most logs, and the server softwaremay be configured to collect additional data, resulting (naturally) in additional fields. I've included a large number of the possible additional fields for further clarification (for those of you who have them). There may be other fields not covered below.
I'll be up front about the fact that your log fields may be in a completelydifferent order than represented above. This can really become a headache when trying to decipher which field is which. If you have access to server documentation, it should help (though in the case of MS IIS, the documentation I have really doesn't help...big surprise there). I should add that the WebTrends site has a great glossary of log report related terms (but it's still not comprehensive).
- Clients IP address:
The host is the user's server that requested the data (page). In our case the host's IP address has been resolved into a server name. In your case, the host may be represented by a server name or an IP number (such as 22.214.171.124). Whichever you get reflects whether your server is set up to "resolve" (look up) IP addresses.
- Clients Username:
-(in our case no data was recorded)
This field is reserved for the identification of the person's user name. This field is rarely used, and data which appears in it can be faked, so in most cases it's worth ignoring.
- Date (mm/dd/yy):
This field can record the date in various formats.
This field can record the time in various formats, and can include the offset from GMT (Greenwich Mean Time) at the end. Ours doesn't include the GMT offset, but if it did, it would say something like -0400 (for US Eastern Standard Time, 4 hours behind GMT in the summer) or +0700.
Um, I think this indicates the kind of service the server is configured to offer (?)
- Computer name:
Name given the host server. In our case, AGNR stands for the College of Agriculture and Natural Resources.
- Server IP address (Multihome domain field):
As indicated, this is either the IP address of the server or in its resolved form (the actual URL).
- Processing time:
How long it took the server to process the request in milliseconds.
- Bytes recieved:
Data received from the client.
- Bytes sent:
Size of file sent to client. If the field contained a "-" or a "0", this probably means that header information only was requested (most often used by spiders and bots).
- Status Code:
200 in particular reflects a successful file transfer. This code could be anything from 1xx to 5xx, depending on the action resulting from the file request. In brief, actions are:
- Windows NT status Code:
Okay, I'm clueless, and the documentation is of no help!
This records the type of request from the client's browser to the server. Types of requests can include:
- Target file:
This indicates the path to the requested file.
Mozilla/4.03 [en] (Win95; I)
Indicates which browser and platform the visitor was using. Mozilla is the same as Netscape Navigator. When there is additional info such as in "
Mozilla/4.0 (compatible; MSIE 4.01; AOL 4.0; Windows 95)" this usually indicates that it was MSIE masquerading as Netscape (happens sometimes). Alternatively, this field could record a visiting spider.
- Referring URL:
This indicates who referred the visitor or bot to the web page, thus telling you from where your visitors are coming. In this case, the visitor came from a search engine (Lycos), and you have the added benefit of knowing which words they were using for the search ("grub control" - yummy!).
- Script or dll variables:
-(in our case no data was recorded)
1xx - continue2xx - success
3xx - redirect (also a success)
4xx - client error (failure)
5xx - server error (failure)
(for information on specific numbers, please see this page.)
GET - requests the file in its entiretyHEAD - requests the header information of the file
POST - places a file on the server
Q. Do I have to go through each and every log to figure that stuff out andcalculate my statistics?
A. I certainly hope not! You could try using software to analyze your server logs and create reports for you. I've used freeware including the popular Analog and the less known WWWStat. Netscape has a comprehensive list of links to similar software. I've also had the great pleasure of using WebTrends Log Analyzer, a top-of-the-line tool that has a wide variety of capabilities and functions. I have found it worth the money, especially because my employer paid for it.
If you want to use a good utility for finding and counting your log information, I hear "grep" is a great tool built into UNIX boxes, and downloadable for the Mac. I've not used it, so I can't personally vouch for it. A final alternative would be to use the search or find feature around various parameters. I'm not going to go into the specifics of any of these options, in the hopes that any supporting documentation that comes with the software/your system will be sufficient in helping you perform your chosen task. When all else fails, call tech support.
If you have any questions or comments, please don't hesitate to contactme.