Article Preview
TopTracking Users
HTTP logs are collected by HTTP servers such as Apache. Generally these logs use the Common Logfile Format (WWW Consortium, n.d.a). These record which pages are accessed, the date, which IP address made the request, and optional other information such as referring page. These logs form a relatively simple way of measuring what users are accessing. However, they only give partial information. They show the requests that actually made it to the server: many organisations now use proxy caches, and if there is a “hit” on a cache, then the request will be handled by the proxy and not make it back to the source server. This can be alleviated by setting the Expires time for each document to zero, but breaks the value of caching.
If the user makes use of the Back button in the browser, then the document will be retrieved from the browser’s own cache. This cannot be avoided except by disabling the Back button.
The principal problem is that the server logs can only show that a page is requested from a server. What is done with that page is unknown. A user may examine it for a long time or simply discard it. Further, it is not clear whether it is a human using a browser or some automated agent such as a spider.