Efficient Web Mining for Traversal Path Patterns

Efficient Web Mining for Traversal Path Patterns

Zhixiang Chen (The University of Texas-Pan American, USA), Richard H. Fowler (The University of Texas-Pan American, USA), Ada Wai-Chee Fu (The Chinese University of Hong Kong, Hong Kong) and Chunyue Wang (The University of Texas-Pan American, USA)
Copyright: © 2005 |Pages: 17
DOI: 10.4018/978-1-59140-414-9.ch015
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

A maximal forward reference of a Web user is a longest consecutive sequence of Web pages visited by the user in a session without revisiting some previously visited page in the sequence. Efficient mining of frequent traversal path patterns, that is, large reference sequences of maximal forward references, from very large Web logs is a fundamental problem in Web mining. This chapter aims at designing algorithms for this problem with the best possible efficiency. First, two optimal linear time algorithms are designed for finding maximal forward references from Web logs. Second, two algorithms for mining frequent traversal path patterns are devised with the help of a fast construction of shallow generalized suffix trees over a very large alphabet. These two algorithms have respectively provable linear and sublinear time complexity, and their performances are analyzed in comparison with the a priori-like algorithms and the Ukkonen algorithm. It is shown that these two new algorithms are substantially more efficient than the a priori-like algorithms and the Ukkonen algorithm.

Complete Chapter List

Search this Book:
Reset