Restrictive Methods and Meta Methods for Thematically Focused Web Exploration

Restrictive Methods and Meta Methods for Thematically Focused Web Exploration

Sergej Sizov (University of Koblenz-Landau, Germany) and Stefan Siersdorfer (University of Sheffield, UK)
Copyright: © 2008 |Pages: 19
DOI: 10.4018/978-1-59904-847-5.ch023
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

This chapter addresses the problem of automatically organizing heterogeneous collections of Web documents for the generation of thematically-focused expert search engines and portals. As a possible application scenario for our techniques, we consider a focused Web crawler that aims to populate topics of interest by automatically categorizing newly-fetched documents. A higher accuracy of the underlying supervised (classification) and unsupervised (clustering) methods is achieved by leaving out uncertain documents rather than assigning them to inappropriate topics or clusters with low confidence. We introduce a formal probabilistic model for ensemble-based meta methods and explain how it can be used for constructing estimators and for quality-oriented tuning. Furthermore, we provide a comprehensive experimental study of the proposed meta methodology and realistic use-case examples.

Complete Chapter List

Search this Book:
Reset