In a distributed setting, the data are distributed across several data sources. Each data source contains only a fragment of the data. This leads to a fragmentation of a data. Two common types of data fragmentation are horizontal fragmentation, wherein (possibly overlapping) subsets of data tuples are stored at different sites; and vertical fragmentation, wherein (possibly overlapping) subtuples of data tuples are stored at different sites. More generally, the data may be fragmented into a set of relations (tables of a relational database, distributed across multiple sites).
Published in Chapter:
Learning Classifiers from Distributed Data Sources
Doina Caragea (Kansas State University, USA) and Vasant Honavar (Iowa State University, USA)
Copyright: © 2009
|Pages: 8
DOI: 10.4018/978-1-60566-242-8.ch063
Abstract
Recent development of high throughput data acquisition technologies in a number of domains (e.g., biological sciences, atmospheric sciences, space sciences, commerce) together with advances in digital storage, computing, and communications technologies have resulted in the proliferation of a multitude of physically distributed data repositories created and maintained by autonomous entities (e.g., scientists, organizations). The resulting increasingly data-rich domains offer unprecedented opportunities in computer assisted data-driven knowledge acquisition in a number of applications, including, in particular, data-driven scientific discovery, data-driven decision-making in business and commerce, monitoring and control of complex systems, and security informatics.