This chapter describes how Web usage patterns can be used to improve the navigational structure of a Web site. The discussion begins with an illustration of visualization tools that study aggregate and individual link traversals. The use of data mining techniques such as classification, association, and sequence analysis to discover knowledge about Web usage, such as navigational patterns, is also discussed. Finally, a graph theoretic algorithm to create an optimal navigational hyperlink structure, based on known navigation patterns, is presented. The discussion is supported by analysis of real-world datasets.
Web usage mining applies data mining techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web-based applications. While Web content mining and Web structure mining utilize the information found in Web documents, Web usage mining uses secondary data generated by the users’ interaction with the Web. Web access logs available on most servers are good examples of the datasets used in Web usage mining. Other Web usage data may include browser logs, user profiles, registration files, user sessions or transactions, user queries, bookmark folders, as well as mouse clicks and scrolls (Kosala and Blockeel, 2000). Web usage mining includes the creation of user profiles, as well as analysing user access patterns and navigation paths.
Prior to applying data mining techniques, it is essential to understand the dataset. This is typically done by creating multiple summary reports and, if possible, using visual representations. Before writing programs for analyzing Web access logs, one may want to consider one of the analysis tools already available. These analysis tools may provide answers to most questions regarding the usage of Web sites. The list below provides the freeware and open source Web access analysis tools listed on an Open Directory Web site (http://dmoz.org/). In addition to freeware and open source tools, the listing of commercial tools can also be found on the Open Directory site. This section provides a discussion on how to obtain summary reports, visualization of aggregate clickstream, as well as individual user sessions from Web access logs.
Key Terms in this Chapter
Minimal Spanning Tree: It has minimum sum of weights among all the trees that connect all the nodes in a graph.
Web Usage Mining: Extracting useful information from the Web usage statistics.
Web graph: Graph theoretical representation of a Web site.
Web Structure Mining: Extracting useful information from the hyperlinked Web structure.