Data Cleansing and Validation for Multiple Site Link Structure Analysis

Data Cleansing and Validation for Multiple Site Link Structure Analysis

Mike Thelwall
Copyright: © 2005 |Pages: 20
DOI: 10.4018/978-1-59140-414-9.ch010
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

A range of techniques is described for cleansing and validating link data for use in different types of Web structure mining, and some applications are given. The main application area is Multiple Site Link Structure Analysis, which typically involves mining patterns from themed collections of Websites. The importance of data cleansing and validation stems from the fact that Web data are typically very messy. It involves extensive duplication of pages and page components, which when analyzing raw Web data may give meaningless results.

Complete Chapter List

Search this Book:
Reset