Information Extraction: Methodologies and Applications
Jie Tang (Tsinghua University, China), Mingcai Hong (Tsinghua University, China), Duo Liang Zhang (Tsinghua University, China) and Juanzi Li (NEC Labs, China)
Copyright: © 2008
This chapter is concerned with the methodologies and applications of information extraction. Information is hidden in the large volume of web pages and thus it is necessary to extract useful information from the web content, called Information Extraction. In information extraction, given a sequence of instances, we identify and pull out a sub-sequence of the input that represents information we are interested in. In the past years, there was a rapid expansion of activities in the information extraction area. Many methods have been proposed for automating the process of extraction. However, due to the heterogeneity and the lack of structure of Web data, automated discovery of targeted or unexpected knowledge information still presents many challenging research problems. In this chapter, we will investigate the problems of information extraction and survey existing methodologies for solving these problems. Several real-world applications of information extraction will be introduced. Emerging challenges will be discussed.