This chapter aims at providing an overview about the use of statistical methods supporting the Web Usage Mining. Within the first part is described the framework of the Web Usage Mining as a branch of the Web Mining committed to the study of how to use a Website. Then, the data (object of the analysis) are detailed together with the problems linked to the pre-processing. Once clarified, the data origin and their treatment for a correct development of a Web Usage analysis,the focus shifts on the statistical techniques that can be applied to the analysis background, with reference to binary segmentation methods. Those latter allow the discrimination through a response variable that determines the affiliation of the users to a group by considering some characteristics detected on the same users.
Web Usage Mining
Web Usage Mining has been defined as the application of data mining techniques to large Web data repositories in order to extract usage patterns, namely the visitor behavior. As further step, pattern discovery and patter analysis allow for profiling users and their preferences. For that statistical methods play a fundamental role. It is definitively possible to identify suitable attributes and main features characterizing a typology of users, thus providing a Web personalization.
This chapter deals with Web Usage Mining focalizing the attention on statistical methods for user profiling, among them binary segmentation or tree-based modeling will be considered in details.Top
Web Mining Branches
In the framework of Web Mining, User Profiling represents a fundamental application. Nowadays, huge information run in internet as well as in the Web Server during various types of its activities. The knowledge discovery process and the extraction of useful information in internet depend on both the ability of the Web navigator and the performance of the searching engine. Recently, the scientific research has focalized the attention on suitable procedures to profile different typologies of users by analyzing similarities and dissimilarities in their internet behavior (Berendt, Hotho, Mladenic, van Someren, Spiliopoulou, Stumme, 2003).
User Profiling is conceptually the act of building up a profile of who are the users and what they want to do. These profiles are used to group and priorities in their activities are identified. Knowing who are the users and what they want is a vital step in meeting their needs.
From the methodological point of view, User Profiling is one of the main purposes of Web Usage Mining, which is the process of applying data mining techniques to the discovery of usage patterns from Web data in various context applications. Web Usage Mining is one branch of Web Mining, namely data mining on Web data. Other branches of Web Mining are Web Content Mining, which is the process to analyze various aspects related to the contents of a Web site such text, graphics etc., and Web Structure Mining, which is the process to analyze the structure of the Web site in terms of organization of the Web pages and their design.
Key Terms in this Chapter
User Profiling: The process of grouping users who browse a Website with similar behavior, by doing this you can gather similar users to obtain typical categories
Tree-Based Modeling: Partitioning algorithms that gives a nice and simple graphical representation (the decision tree) as output.
Cluster Methods: The clustering algorithms can provide to an aggregation process (bottom-up procedure) or a disaggregation procedure (top down procedure). The target is to group the object.
Segmentation: An iterative process that leads to the foundation of similar groups according to a response variable
Classif ication: Classification means to find a partition of objects into internally homogeneous groups and externally heterogeneous groups so that objects within each group are similar.
Association Rules: A methodology used to discover the co-occurrence between two or more items in a large dataset. In Web Mining, association rules are used to identify groups of pages that are jointly consulted within a set of sessions in a browsing process
Web Usage Mining: The branch of Web mining that refers to the study of way of use for a Website.