Rule-Based Parsing for Web Data Extraction

Rule-Based Parsing for Web Data Extraction

David Camacho (Universidad Carlos III de Madrid, Spain), Ricardo Aler (Universidad Carlos III de Madrid, Spain) and Juan Cuadrado (Universidad Carlos III de Madrid, Spain)
Copyright: © 2004 |Pages: 23
DOI: 10.4018/978-1-59140-194-0.ch005
OnDemand PDF Download:


How to build intelligent robust applications that work with the information stored in the Web is a difficult problem for several reasons which arise from the essential nature of the Web: the information is highly distributed, it is dynamic (both in content and format), it is not usually correctly structured, and the web sources will be unreachable at some times. To build robust and adaptable web systems, it is necessary to provide a standard representation for the information (i.e., using languages such as XML and ontologies to represent the semantics of the stored knowledge). However, this is actually a research field and usually most web sources do not provide their information in a structured way. This chapter analyzes a new approach that allows us to build robust and adaptable web systems by using a multi-agent approach. Several problems, including how to retrieve, extract, and manage the stored information from web sources, are analyzed from an agent perspective. Two difficult problems will be addressed in this chapter: designing a general architecture to deal with the problem of managing web information sources; and how these agents could work semiautomatically, adapting their behaviors to the dynamic conditions of the electronic sources. To achieve the first goal, a generic web-based multi-agent system (MAS) will be proposed, and will be applied in a specific problem to retrieve and manage information from electronic newspapers. To partially solve the problem of retrieving and extracting web information, a semiautomatic web parser will be designed and deployed like a reusable software component. This parser uses two sets of rules to adapt the behavior of the web agent to possible changes in the web sources. The first one is used to define the knowledge to be extracted from the HTML pages; the second one represents the final structure to store the retrieved knowledge. Using this parser, a specific web-based multi-agent system will be implemented.

Complete Chapter List

Search this Book:
Table of Contents
Masoud Mohammadian
Chapter 1
Hui Yang, Minjie Zhang
The rapid proliferation of online textual databases on the Internet has made it difficult to effectively and efficiently search desired information... Sample PDF
Potential Cases, Database Types, and Selection Methodologies for Searching Distributed Text Databases
Chapter 2
Masoud Mohammadian, Ric Jentzsch
The World Wide Web has added an abundance of data and information to the complexity of information for disseminators and users alike. With this... Sample PDF
Computational Intelligence Techniques Driven Intelligent Agents for Web Data Mining and Information Retrieval
Chapter 3
Juan Manuel Dodero, Paloma Diaz, Ignacio Aedo
Knowledge creation or production in a distributed knowledge management system is a collaborative task that needs to be coordinated. A multi-agent... Sample PDF
A Multi-Agent Approach to Collaborate Knowledge Production
Chapter 4
Jin Sung Kim
One of the attractive topics in the field of Internet business is blending Artificial Intelligence (AI) techniques with the business process. In... Sample PDF
Customized Recommendation Mechanism Based on Web Data Mining and Case-Based Reasoning
Chapter 5
David Camacho, Ricardo Aler, Juan Cuadrado
How to build intelligent robust applications that work with the information stored in the Web is a difficult problem for several reasons which arise... Sample PDF
Rule-Based Parsing for Web Data Extraction
Chapter 6
Rowena Chau, Chung-Hsing Yeh
This chapter presents a novel user-oriented, concept-based approach to multilingual web content mining using self-organizing maps. The multilingual... Sample PDF
Multilingual Web Content Mining: A User-Oriented Approach
Chapter 7
Kaïs Khrouf, Chantal Soule-Dupuy
An enterprise memory must be able to be used as a basis for the processes of scientific or technical developments. Indeed, it was proven that... Sample PDF
A Textual Warehouse Approach: A Web Data Repository
Chapter 8
T. Beran, T. Macek
This chapter describes a rather less traditional technique of text processing. The technique is based on the binary neural network Correlation... Sample PDF
Text Processing by Binary Neural Networks
Chapter 9
Daniel Rivero, Juan R. Rabunal, Julián Dorado, Alejandro Pazos, Nieves Pedreira
In this chapter, we present an application of Genetic Programming (GP) in the field of data mining and extraction of Artificial Neural Networks... Sample PDF
Extracting Knowledge from Databases and ANNs with Genetic Programming: Iris Flower Classification Problem
Chapter 10
Koichi Jurumatani
We propose a social coordination mechanism that is realized with CONSORTS, a new kind of multi-agent architecture for ubiquitous agents. By social... Sample PDF
Social Coordination with Architecture for Ubiquitous Agents-CONSORTS
Chapter 11
A. Andreevskaia, R. Abi-Aad, T. Radhakrishnan
This chapter presents a tool for knowledge acquisition for user profiling in electronic commerce. The knowledge acquisition in e-commerce is a... Sample PDF
Agent-Mediated Knowledge Acquisition for User Profiling
Chapter 12
Shinichi Nagano, Yasuyuki Tahara, Tetsuo Hasegawa, Akihiko Ohsuga
Heavy electric machinery industry is currently developing electronic market places of product and parts. PLIB is the standard of dictionary model... Sample PDF
Development of Agent-Based Electronic Catalog Retrieval System
Chapter 13
Samhaa R. El-Baltagy, Ahmed Rafea, Yasser Abdelhamid
This chapter presents a simple framework for extracting information found in publications or documents that are issued in large volumes and which... Sample PDF
Using Dynamically Acquired Background Knowledge for Information Extraction and Intelligent Search
Chapter 14
Shanfeng Zhu, Xiaotie Deng, Qizhi Fang, Weimin Zhang
Web search engines are one of the most popular services to help users find useful information on the Web. Although many studies have been carried... Sample PDF
A Study on Web Searching: Overlap and Distance of the Search Engine Results
Chapter 15
S. Vrettos, A. Stafylopatis
Our work proposes the use of topic taxonomies as part of a filtering language. Given a taxonomy, we train classifiers for every topic of it. The... Sample PDF
Taxonomy Based Fuzzy Filtering of Search Results
Chapter 16
Wei Lai, Maolin Huang, Kang Zhang
A graph can be used for web navigation. The whole of cyberspace can be regarded as one huge graph. To explore this huge graph, it is critical to... Sample PDF
Generating and Adjusting Web Sub-Graph Displays for Web Navigation
Chapter 17
Hong Shi, Ji-Fu Zhang
There are frequent occurrences of pattern match involved in the process of counting the support count of candidates, which is one of the main... Sample PDF
An Algorithm of Pattern Match Being Fit for Mining Association Rules
Chapter 18
Jon T.S. Quah, Y. M. Chen, Winnie C.H. Leow
With the rapid evolution of the Internet, information overload is becoming a common phenomenon. It is necessary to have a tool to help users extract... Sample PDF
Networking E-Learning Hosts Using Mobile Agents
About the Authors