Machine Learning Techniques for Wrapper Maintenance

Machine Learning Techniques for Wrapper Maintenance

Kristina Lerman (University of Southern California, USA), Steven N. Minton (Fetch Technologies Inc., USA) and Craig A. Knoblock (Fetch Technologies Inc. & University of Southern California, USA)
DOI: 10.4018/978-1-59140-405-7.ch017
OnDemand PDF Download:
No Current Special Offers


The proliferation of online information has led to an increased use of wrappers for extracting data from Web sources and transforming it to a structured format. The resulting data can then be used to build new enterprise applications. While most of the previous research has focused on quick and efficient generation of wrappers, the development of tools for wrapper maintenance has received less attention. This is an important problem, because Web sources often change in ways that prevent the wrappers from operating correctly. In this chapter, we describe machine learning techniques for verifying that a wrapper is working correctly and repairing it if not. Our approach is to learn structural descriptions of data and use these descriptions to verify that the wrapper is correctly extracting data. The repair algorithm automatically recovers from Web source format changes by identifying data so that a new wrapper may be generated for this source.

Complete Chapter List

Search this Book: