Source Integration for Data Warehousing
Andrea Cali (University of Roma - La Sapienza, Italy), Domenico Lembo (University of Roma - La Sapienza, Italy), Maurizio Lenzerini (University of Roma - La Sapienza, Italy) and Riccardo Rosati (University of Roma - La Sapienza, Italy)
Copyright: © 2003
While the main goal of a data warehouse is to provide support for data analysis and management’s decisions, a fundamental aspect in design of a data warehouse system is the process of acquiring the raw data from a set of relevant information sources. We will call source integration system the component of a data warehouse system dealing with this process. The main goal of a source integration system is to deal with the transfer of data from the set of sources constituting the application-oriented operational environment, to the data warehouse. Since sources are typically autonomous, distributed, and heterogeneous, this task has to deal with the problem of cleaning, reconciling, and integrating data coming from the sources. The design of a source integration system is a very complex task, which comprises several different issues. The purpose of this chapter is to discuss the most important problems arising in the design of a source integration system, with special emphasis on schema integration, processing queries for data integration, and data cleaning and reconciliation.