The Integral of Spatial Data Mining in the Era of Big Data: Algorithms and Applications

The Integral of Spatial Data Mining in the Era of Big Data: Algorithms and Applications

Gebeyehu Belay Gebremeskel (Chongqing University, China), Chai Yi (Chongqing University, China) and Zhongshi He (Chongqing University, China)
DOI: 10.4018/978-1-5225-2031-3.ch006
OnDemand PDF Download:
List Price: $37.50
10% Discount:-$3.75


Data Mining (DM) is a rapidly expanding field in many disciplines, and it is greatly inspiring to analyze massive data types, which includes geospatial, image and other forms of data sets. Such the fast growths of data characterized as high volume, velocity, variety, variability, value and others that collected and generated from various sources that are too complex and big to capturing, storing, and analyzing and challenging to traditional tools. The SDM is, therefore, the process of searching and discovering valuable information and knowledge in large volumes of spatial data, which draws basic principles from concepts in databases, machine learning, statistics, pattern recognition and 'soft' computing. Using DM techniques enables a more efficient use of the data warehouse. It is thus becoming an emerging research field in Geosciences because of the increasing amount of data, which lead to new promising applications. The integral SDM in which we focused in this chapter is the inference to geospatial and GIS data.
Chapter Preview


Data Mining (DM) is a rapidly expanding field in many disciplines. It plays a significant role in human activities and has become an essential component in various areas and issues that employed to Knowledge Discovery (KD) process to analyzing large-scale data from different sources and perspectives (Martin et al., 2001; Krzysztof et al., 1996). It is greatly inspiring to analyze massive data types, which includes geospatial, astronomic, climate, image and other forms of data sets. The fast growths of data are characterized as high volume, velocity, variety, variability, value and others, which can be collected and generated from various sources. The data includes, satellite, remote sensing, Geographic Positioning System (GPS), areal images, photographs, log files, social media, machines, video, textual, which are too complex and big to capturing, storing and analyzing, and challenging by traditional tools (Diansheng & Jeremy, 2009). These sources have strained the capabilities of classical relational Database Management Systems (DBMS) and spawned a host of new technologies, approaches, and platforms called “Big Data.” Therefore, the potential values of geospatial data using DM techniques are great and are clearly established by a growing number of studies (Deepali, 2013; Ranga et al., 2012).

In the last two decades, the integrity of DM and Geographic Information Systems (GIS) was limited, and the actual spatial data analysis techniques suffer from the huge amount of complex data to process (Anselin, 1998). Indeed, earth observation data (acquired from optical, radar and hyperspectral sensors installed in terrestrial, airborne or space-borne platforms) is often heterogeneous, multi-scale, incomplete, and composed of diverse objects. However, the existing data analytics were traditional and doing very basic spatial analysis functionality, which confined to analysis that involves descriptive statistical displays, such as histograms and/or pie charts. Moreover, the complete DM process is a combination of many sub-processes, which includes data extraction and cleaning, feature selection, algorithm's design and other analytics of the spatial data (Zaragozi, et al., 2012). In many geospatial research works, the non-spatial data did not well addressed and synthesis by any of the various data analysis techniques. Based on these and other facts, we proposed an Integral Spatial DM (ISDM) to discuss and introduced a novel way of handling and analyzing geospatial and non-spatial data that allows flexibility to describe elements together to optimize spatially based decision-making process (Carlos et al., 2013; Xing et al., 2013).

Key Terms in this Chapter

RS: It is a software application that process remote sensing data, which specialized that capable of reading file formats that contain sensor image data, Georeferencing information such as satellite data and sensor metadata.

GIS: It is also known as Geospatial information systems, or science is a piece of computer software and hardware systems that enable users to capture, store, manage and analyze geographically referenced data for the purpose of manipulation, viewing and analysis in whichever context and parameters the user desires or needs.

Spatial Relational Database: It is the data attributes recording based on their explicit locations and extension of spatial objects, which influenced and defined its spatial neighborhood (such as topological, distance and direction relations) that are used by spatial data mining algorithms.

Spatial Big Data: It is big spatial data have been and continue to be, collected with global positioning systems, wearable devices and others ways, which innovate a dynamic and scalable phenomenon and technology to analysis big and complex spatial data to acquire new insights and knowledge, to support decision and policy making process.

Data Proximity: Proximity refers to co-occurrence between terms of language, which support the presence of two or more terms within a given set of data. Therefore, Data proximity is a data-driven approach to corpus semantics analysis in various data mining techniques, which discriminate the data based upon the idea that words occurring together or in similar contexts.

Mining Algorithms: It is a set of heuristics and computational methods to define the optimal parameters for creating the mining model from the spatial data that can be applied to the entire dataset to extract actionable patterns and detailed statistics.

Spatial Database: It is a database to store and query data that represents objects defined in a geometric space, which includes points, lines, and polygons.

Geo-References: It is the process of assigning spatial coordinates to each pixel of the raster data, which is spatial in nature, but has no explicit geographic reference system.

Complete Chapter List

Search this Book: