Combining Data Warehousing and Data Mining Techniques for Web Log Analysis
Torben Pedersen (Aalborg University, Denmark), Jesper Thorhauge (Conzentrate, Denmark) and Søren Jespersen (Conzentrate, Denmark)
Copyright: © 2007
Enormous amounts of information about Web site user behavior are collected in Web server logs. However, this information is only useful if it can be queried and analyzed to provide high-level knowledge about user navigation patterns, a task that requires powerful techniques. This chapter presents a number of approaches that combine data warehousing and data mining techniques in order to analyze Web logs. After introducing the well-known click and session data warehouse (DW) schemas, the chapter presents the subsession schema, which allows fast queries on sequences of page visits. Then, the chapter presents the so-called “hybrid” technique, which combines DW Web log schemas with a data mining technique called Hypertext Probabilistic Grammars, hereby providing fast and flexible constraint-based Web log analysis. Finally, the chapter presents a “post-check enhanced” improvement of the hybrid technique.