Improving Web Clickstream Analysis: Markov Chains Models and Genmax Algorithms

Improving Web Clickstream Analysis: Markov Chains Models and Genmax Algorithms

Paolo Baldini (University of Pavia, Italy) and Paolo Giudici (University of Pavia, Italy)
Copyright: © 2008 |Pages: 11
DOI: 10.4018/978-1-59904-528-3.ch014
OnDemand PDF Download:
$37.50

Abstract

Every time a user links up to a web site, the server keeps track of all the transactions accomplished in a log file. What is captured is the "click flow" (clickstream) of the mouse and the keys used by the user during the navigation inside the site. Usually every click of the mouse corresponds to the viewing of a web page. The objective of this chapter is to show how web clickstream data can be used to understand the most likely paths of navigation in a web site, with the aim of predicting, possibly on-line, which pages will be seen, having seen a specific path of other pages before. Such analysis can be very useful to understand, for instance, what is the probability of seeing a page of interest (such as the buying page in an e-commerce site) coming from another page. Or what is the probability of entering (or exiting) the web site from any particular page. From a methodological viewpoint, we present two main research contributions. On one hand we show how to improve the efficiency of the Apriori algorithm; on the other hand we show how Markov chain models can be usefully developed and implemented for web usage mining. In both cases we compare the results obtained with classical association rules algorithms and models.

Complete Chapter List

Search this Book:
Reset
Table of Contents
Foreword
Evangelos Triantaphyllou
Acknowledgment
Giovanni Felici, Carlo Vercellis
Chapter 1
Jonathan Mugan, Klaus Truemper
Frequently, one wants to extend the use of a classification method that, in principle, requires records with True/False values, so that records with... Sample PDF
Discretization of Rational Data
$37.50
Chapter 2
Massimo Liquori, Andrea Scozzari
Traditional classification approaches consider a dataset formed by an archive of observations classified as positive or negative according to a... Sample PDF
Vector DNF for Datasets Classifications: Application to the Financial Timing Decision Problem
$37.50
Chapter 3
Xenia Naidenova
The purpose of this paper is to demonstrate the possibility of transforming a large class of machine learning algorithms into commonsense reasoning... Sample PDF
Reducing a Class of Machine Learning Algorithms to Logical Commonsense Reasoning Operations
$37.50
Chapter 4
Giovanni Felici, Valerio Gatta
The analysis of quality of services is an important issue for the planning and the management of many businesses. The ability to address the demands... Sample PDF
The Analysis of Service Quality Through Stated Preference Models and Rule-Based Classification
$37.50
Chapter 5
Brian C. Lovell, Christian J. Walder
This chapter discusses the use of support vector machines (SVM) for business applications. It provides a brief historical background on inductive... Sample PDF
Support Vector Machines for Business Applications
$37.50
Chapter 6
Ali Smith, Kate A. Smith
The most critical component of kernel based learning algorithms is the choice of an appropriate kernel and its optimal parameters. In this paper we... Sample PDF
Kernel Width Selection for SVM Classification: A Meta-Learning Approach
$37.50
Chapter 7
Carlotta Orsenigo, Carlo Vercellis
In the context of biolife science, predicting the folding structure of a protein plays an important role for investigating its function and... Sample PDF
Protein Folding Classification Through Multicategory Discrete SVM
$37.50
Chapter 8
Li Liao
Recently, clustering and classification methods have seen many applications in bioinformatics. Some are simply straightforward applications of... Sample PDF
Hierarchical Profiling, Scoring, and Applications in Bioinformatics
$37.50
Chapter 9
Monica Chis
Clustering is an important technique used in discovering some inherent structure present in data. The purpose of cluster analysis is to partition a... Sample PDF
Hierarchical Clustering Using Evolutionary Algorithms
$37.50
Chapter 10
T. Warren Liao
In this chapter, we present genetic algorithm (GA) based methods developed for clustering univariate time series with equal or unequal length as an... Sample PDF
Exploratory Time Series Data Mining by Genetic Clustering
$37.50
Chapter 11
Alex Burns, Shital Shah, Andrew Kusiak
This paper presents a hybrid approach that integrates a genetic algorithm (GA) and data mining to produce control signatures. The control signatures... Sample PDF
Development of Control Signatures with a Hybrid Data Mining and Genetic Algorithm
$37.50
Chapter 12
Enrico Fagiuoli, Sara Omerino, Fabio Stella
The importance of data cleaning and data quality is becoming increasingly clear as evidenced by the surge in software, tools, consulting companies... Sample PDF
Bayesian Belief Networks for Data Cleaning
$37.50
Chapter 13
Chuck P. Lam, David G. Stork
Data quality is an important factor in building effective classifiers. One way to improve data quality is by cleaning labeling noise. Label cleaning... Sample PDF
A Comparison of Revision Schemes for Cleaning Labeling Noise
$37.50
Chapter 14
Paolo Baldini, Paolo Giudici
Every time a user links up to a web site, the server keeps track of all the transactions accomplished in a log file. What is captured is the "click... Sample PDF
Improving Web Clickstream Analysis: Markov Chains Models and Genmax Algorithms
$37.50
Chapter 15
Antonino Staiano, Lara De Vinco, Giuseppe Longo, Roberto Tagliaferri
Probabilistic Principal Surfaces (PPS) is a non linear latent variable model with very powerful visualization and classification capabilities which... Sample PDF
Advanced Data Mining and Visualization Techniques with Probabilistic Principal Surfaces: Applications to Astronomy and Genetics
$37.50
Chapter 16
Mehmed Kantardzic, Pedram Sadeghian, Walaa M. Sheta
Advances in computing techniques, as well as the reduction in the cost of technology have made possible the viability and spread of large virtual... Sample PDF
Spatial Navigation Assistance System for Large Virtual Environments: The Data Mining Approach
$37.50
Chapter 17
Antonio Congiusta, Domenico Talia, Paolo Trunfio
Knowledge discovery is a compute and data intensive process that allows for finding patterns, trends, and models in large datasets. The Grid can be... Sample PDF
Using Grids for Distributed Knowledge Discovery
$37.50
Chapter 18
Nikos Pelekis, Babis Theodoulidis, Ioannis Kopanakis, Yannis Theodoridis
QOSP Quality of Service Open Shortest Path First based on QoS routing has been recognized as a missing piece in the evolution of QoS-based services... Sample PDF
Fuzzy Miner: Extracting Fuzzy Rules from Numerical Patterns
$37.50
Chapter 19
Yanbing Liu, Menghao Wang, Jong Tang
QOSPF (Quality of Service Open Shortest Path First) based on QoS routing has been recognized as a missing piece in the evolution of QoS-based... Sample PDF
Routing Attribute Data Mining Based on Rough Set Theory
$37.50
About the Contributors