Article Preview
TopIntroduction
Data integration has long been discussed in other literature reviews. Many concerns have been encountered, as most of the datasets addressed by individual applications are very often heterogeneous and geographically distributed. Hence, the ability to make data stores interoperable remains a crucial factor for the development of these types of systems (Wohrer et al., 2004). Clearly, one of the challenges for such facilitation is that of data integration; these challenges have been widely discussed (Calvanese et al., 1998; Reinoso et al., 2008). Moreover, Foster et al. (2001) explain that the combination of large dataset size, geographic distribution of users and resources, and computationally intensive analysis results in complex and stringent performance demands that, until recently, have not been satisfied by any existing computational and data management infrastructure. Recent advances in computer networking and digital resource integration resulted in the concept of Grid technology. In particular, Grid computing addresses the issue of collaboration, data and resource sharing (Kodeboyina, 2004). It has been described as the infrastructure and set of protocols to enable the integrated, collaborative use of distributed heterogeneous resources including high-end computers, networks, databases, and scientific instruments owned and managed by multiple organizations, referred to as Virtual Organizations (Foster, 2002). A Virtual Organization (VO) is formed when different organizations come together to share resources and collaborate in order to achieve a common goal (Foster et al., 2002).
The need to integrate databases into the Grid has also been recognized (Nieto-Santisteban, 2004) in order to support science and business database applications (Antonioletti et al., 2005). Significant effort has gone into defining requirements, protocols and implementing the OGSA-DAIS (Open Grid Services Architecture – Data, Access and Integration Services) specification as the means for users to develop relevant data Grids to conveniently control the sharing, accessing and management of large amounts of distributed data in Grid environments (Antonioletti et al., 2005; Atkinson et al., 2003). Ideally, OGSA-DAIS as a data integration specification aims to allow users to specify ‘what’ information is needed without having to provide detailed instructions on ‘how’ or ‘from where’ to obtain the information (Reinoso Castillo et al., 2004).
On the other hand, working with obsolete data yields to an information gap that in turn may well compromise decision-making. Bessis (2009) and Bessis and Asimakopoulou (2008) explain that it is value creation for collaborators to automatically stay informed of data that may change over time. Repeatedly searching data sources for the latest relevant information on a specific topic of interest can be both time-consuming and frustrating. A set of technologies collectively referred to as ‘Push’, ‘NetCasting’ or ‘WebCasting’ was introduced in late 90s. This set of technologies allowed the automation of search and retrieval functions. Ten years on, Web Services have overtaken most of Push technology functionality and become a standard supporting recent developments in Grid computing with state-of-the-art technology for data and resource integration.