Data Mining Approaches for Geo-Spatial Big Data: Uncertainty Issues

Data Mining Approaches for Geo-Spatial Big Data: Uncertainty Issues

Frederick E. Petry (Geospatial Science and Technology Branch, Naval Research Laboratory, Stennis Space Center, MS, USA)
DOI: 10.4018/joci.2012010104
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

The availability of a vast amount of heterogeneous information from a variety of sources ranging from satellite imagery to the Internet has been termed as the problem of Big Data. Currently there is a great emphasis on the huge amount of geophysical data that has a spatial basis or spatial aspects. To effectively utilize such volumes of data, data mining techniques are needed to manage discovery from such volumes of data. An important consideration for this sort of data mining is to extend techniques to manage the inherent uncertainty involved in such spatial data. In this paper the authors first provide overviews of uncertainty representations based on fuzzy, intuitionistic, and rough sets theory and data mining techniques. To illustrate the issues they focus on the application of the discovery of association rules in approaches for vague spatial data. The extensions of association rule extraction for uncertain data as represented by rough and fuzzy sets are described. Finally an example of rule extraction for both fuzzy and rough set types of uncertainty representations is given
Article Preview

Introduction

Many large research efforts are currently focused on the problem known as Big Data (Boyd & Crawford, 2012, Michael & Miller, 2013). Issues for this involve effectively utilizing a vast amount of heterogeneous information from a variety of sources (Shekar, et al., 2012). Currently there is a great emphasis on the geophysical data that has a spatial basis or spatial aspects (Overpeck, et al., 2011). Advances in instrumentation and sensors have hugely increased the volume, velocity and variety of remote sensed data. For example the imagery data archived at the NASA EOSDIS (Earth Observing System Data and Information System) exceeds 3 PB (Petabytes) and is generating 5 TB (Terabytes) of data per day. To effectively utilize such volumes of data, data mining techniques are very critical (Vatsavi, et al., 2012). One factor that must be considered in particular is how to deal with the inherent uncertainty involved with the huge amount of such spatial data in databases.

Data mining or knowledge discovery (Witten, Frank & Hall 2011; Kantardzic, 2011) generally refers to a variety of techniques that have developed in the fields of databases, machine learning (Alpaydin 2004) and pattern recognition (Han and Kamber 2006). The intent is to uncover useful patterns and associations from large databases. For complex data such as that found in spatial databases (Shekar & Chawla 2003) the problem of data discovery is more involved (Lu et al., 1993, Miller & Han 2009).

Spatial data has traditionally been the domain of geography with various forms of maps as the standard representation. With the advent of computerization of maps, geographic information systems (GIS) have come to fore with spatial databases storing the underlying point, line and area structures needed to support GIS (Longley et al., 2010). A major difference between data mining in ordinary relational databases (Elmasri & Navathe 2010) and in spatial databases is that attributes of the neighbors of some object of interest may have an influence on the object and therefore have to be considered as well. The explicit location and extension of spatial objects define implicit relations of spatial neighborhood (such as topological, distance and direction relations), which are used by spatial data mining algorithms (Ester et al., 2000).

Additionally when wish to consider vagueness or uncertainty in the spatial data mining process (Burrough & Frank 1996, Zhang & Goodchild 2002), an additional level of difficulty is added. In this chapter we describe one of the most common data mining approaches, discovery of association rules, for spatial data for which we consider uncertainty in the extraction rules as represented by both fuzzy set and rough set techniques.

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing