Sharing of Distributed Geospatial Data through Grid Technology
Yaxing Wei (George Mason University, USA), Liping Di (George Mason University, USA), Guangxuan Liao (University of Science and Technology China, China), Baohua Zhao (University of Science and Technology China, China), Aijun Chen (George Mason University, USA) and Yuqi Bai (George Mason University, USA)
Copyright: © 2009
With the rapid accumulation of geospatial data and the advancement of geoscience, there is a critical requirement for an infrastructure that can integrate large-scale, heterogeneous, and distributed storage systems for the sharing of geospatial data within multiple user communities. This article probes into the feasibility to share distributed geospatial data through Grid computing technology by introducing several major issues (including system heterogeneity, uniform mechanism to publish and discover geospatial data, performance, and security) to be faced by geospatial data sharing and how Grid technology can help to solve these issues. Some recent research efforts, such as ESG and the Data Grid system in GMU CSISS, have proven that Grid technology provides a large-scale infrastructure which can seamlessly integrate dispersed geospatial data together and provide uniform and efficient ways to access the data.
In the past, geospatial applications were mostly designed for a single workstation or supercomputer. The geospatial data they need to process were limited to a single storage system or locally networked storage systems. The generated high-level geospatial data and information were also difficult to be shared by other geospatial user communities due to the isolation and heterogeneity of computing platforms and storage systems. Today’s complex geospatial problems need applications that can analyze large quantities of geospatial data coming from different sources which were isolated from each other in the past. For example, a statistical wildfire forecast model at 8-km spatial resolution in the conterminous USA used over 1 terabytes of data obtained from different sources, including data derived from measurements of the NASA Earth Observing System satellites and daily weather data provided by National Oceanic and Atmospheric Administration (NOAA) national climate data center (Ramapriyan et al., 2006). The data volume will increase significantly if similar models of finer spatial resolutions, such as 1 km, are used. The models are being changed and refined from time to time and new geospatial data, the NASA EOS data and NOAA climate data, are being collected by satellites continuously. A fixed computing environment that contains only static data sources will not fulfill such kind of geospatial applications. Consequently, a capability of seamless and dynamic accessing to large quantities of distributed geospatial data is the key to the success of today’s and tomorrow’s geospatial applications.
Key Terms in this Chapter
Data Replica: A complete or partial copy of original data.
DPSS: The Distributed-Parallel Storage System (DPSS) is a scalable, high-performance, distributed-parallel data storage system orginally developed as part of the DARPA -funded MAGIC Testbed, with additional support from the U.S. Dept. of Energy, Energy Research Division, Mathematical, Information, and Computational Sciences Office.
HPSS: High Performance Storage System (HPSS) is hierarchical storage system software that manages and accesses terabytes to petabytes of data on disk and robotic tape libraries.
SRB: The Storage Resource Broker (SRB) is a Data Grid Management System (DGMS) or simply a logical distributed file system based on a client-server architecture which presents the user with a single global logical namespace or file hierarchy.
Certificate: A public key and information about the certificate owner bound together by the digital signature of a CA. In the case of a CA certificate the certificate is self signed, i.e., it was signed using its own private key.
Grid Technology: Grid technology is an emerging computing model that provides the ability to perform higher throughput computing by taking advantage of many networked computers to model a virtual computer architecture that is able to distribute process execution across a parallel infrastructure.
OGSA-DAI: Open Grid Services Architecture – Data Accessing Interface. It is a middleware product which supports the exposure of data resources, such as relational or XML databases, on to Grids.
Virtual Organization: A Virtual Organization is a group of individuals or institutions who share the computing resources of a “Grid” for a common goal.
GridFTP: Extension of traditional FTP protocol. It is a uniform, secure, high-performance interface to file-based storage systems on the Grid.
X.509: In cryptography, X.509 is an ITU-T standard for public key infrastructure (PKI). X.509 specifies, amongst other things, standard formats for public key certificates and a certification path validation algorithm.