Modeling and Computing Overlapping Aggregation of Large Data Sequences in Geographic Information Systems

Modeling and Computing Overlapping Aggregation of Large Data Sequences in Geographic Information Systems

Driss En-Nejjary (Université Clermont Auvergne, LIMOS, Irstea, UR TSCF, Clermont-Ferrand, France), Francois Pinet (Université Clermont Auvergne, Irstea, UR TSCF, Centre de Clermont-Ferrand, France) and Myoung-Ah Kang (Université Clermont-Auvergne, LIMOS, Clermont-Ferrand, France)
Copyright: © 2019 |Pages: 22
DOI: 10.4018/IJISMD.2019010102

Abstract

Recently, in the field of information systems, the acquisition of geo-referenced data has made a huge leap forward in terms of technology. There is a real issue in terms of the data processing optimization, and different research works have been proposed to analyze large geo-referenced datasets based on multi-core approaches. In this article, different methods based on general-purpose logic on graphics processing unit (GPGPU) are modelled and compared to parallelize overlapping aggregations of raster sequences. Our methods are tested on a sequence of rasters representing the evolution of temperature over time for the same region. Each raster corresponds to a different data acquisition time period, and each raster geo-referenced cell is associated with a temperature value. This article proposes optimized methods to calculate the average temperature for the region for all the possible raster subsequences of a determined length, i.e., to calculate overlapping aggregated data summaries. In these aggregations, the same subsets of values are aggregated several times. For example, this type of aggregation can be useful in different environmental data analyses, e.g., to pre-calculate all the average temperatures in a database. The present article highlights a significant increase in performance and shows that the use of GPGPU parallel processing enabled us to run the aggregations up to more than 50 times faster than the sequential method including data transfer cost and more than 200 times faster without data transfer cost.
Article Preview

1. Introduction

The current advances in sensor technology, remote sensing and computer techniques have led to the production of large volumes of spatial data. Sensors are now smaller, cheaper and even smart (Melesse et al., 2007). Increasingly more geo-referenced sensors are deployed for many applications, such as environmental monitoring, precision agriculture, positioning, etc. Remote sensing and spatiotemporal simulation also produce large geo-referenced datasets. The generation of spatiotemporal data at a large scale and in a high-resolution leads to the development of new techniques to manage the volumes of produced data (Prasad et al., 2015). Many of these data take the form of raster sets. A raster is a geo-referenced 2-dimensional array in which each cell is associated with a value. The cells of a raster can be represented by pixels where the colors correspond to different values of a measure, such as temperature, vegetation density, CO2 measurements, etc. (Kang et al., 2015). Data availability and data storage are often no longer barriers, whereas the real bottleneck is, in many cases, the analysis of these spatial data that continue to grow dramatically (Barbian and Assunção, 2017). Map algebra is one of the usual raster analysis techniques. It is a set of conventions, capabilities and techniques to process rasters (Pullar, 2001). Map algebra allows for the defining of how a set of rasters can be aggregated, and thus, it proposes different types of functions that usually have one or several rasters as inputs and that returns one raster or one indicator as a result (Tomlin, 1994). This technique can be used to produce a summary of a set of rasters. For example, the author of (Kang et al., 2015) shows a method to aggregate a set of rasters representing geo-referenced air quality values over time in order to produce a single raster summarizing the air quality for a determined period of time. In (Kang et al., 2015), using a data warehouse, users can access a raster summary instead of browsing all the rasters. There is one raster for every 15-minute time period and for one pollutant. Every raster in the data set is associated to the same spatial region. As shown below, users visualize raster summary aggregated according to different dimension (time, pollutant family, etc.).

Table 1.
Visualize raster summary according to different dimension

When the size of the raster and the set of raster data are small, the map algebra operation executions are fast since the functions of map algebra are usually based on simple arithmetic operations (addition, subtraction, minimum, maximum, average, etc.). However, when a large raster dataset is used, it is important to optimize the raster computation to obtain a reasonable execution time. As raster processing is often highly parallelizable, one method to improve the performance is to use a Graphics Processing Unit (GPU).

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 10: 4 Issues (2019): 1 Released, 3 Forthcoming
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing