Text Warehousing: Present and Future

Text Warehousing: Present and Future

Antonio Badia (University of Louisville, USA)
Copyright: © 2006 |Pages: 26
DOI: 10.4018/978-1-59140-655-6.ch004


Data warehouses, already established as the main repository of data in the enterprise, are now being used to store documents (e-mails, manuals, reports, etc.) so as to capture more domain information. In order to integrate information in natural language (so-called unstructured data) with information in the database (structured and semistructured data), existing techniques from Information Retrieval are being used. In this chapter, which is part overview and part position paper, we review these techniques and discuss their limitations. We argue that true integration cannot be achieved within the framework of Information Retrieval and introduce another paradigm, based in Information Extraction. We discuss the main characteristics of Information Extraction and analyze the challenges that stand on the way of this technology being widely used. Finally, we close with some considerations on future developments in the general area of documents in databases.

Complete Chapter List

Search this Book: