Clustering Earthquake Data: Identifying Spatial Patterns From Non-Spatial Attributes

Clustering Earthquake Data: Identifying Spatial Patterns From Non-Spatial Attributes

Cihan Savaş, Mehmet Samet Yıldız, Süleyman Eken, Cevat İkibaş, Ahmet Sayar
Copyright: © 2019 |Pages: 16
DOI: 10.4018/978-1-5225-7519-1.ch010
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Seismology, which is a sub-branch of geophysics, is one of the fields in which data mining methods can be effectively applied. In this chapter, employing data mining techniques on multivariate seismic data, decomposition of non-spatial variable is done. Then k-means clustering, density-based spatial clustering of applications with noise (DBSCAN), and hierarchical tree clustering algorithms are applied on decomposed data, and then pattern analysis is conducted using spatial data on the resulted clusters. The conducted analysis suggests that the clustering results with spatial data is compatible with the reality and characteristic features of regions related to earthquakes can be determined as a result of modeling seismic data using clustering algorithms. The baseline metric reported is clustering times for varying size of inputs.
Chapter Preview
Top

Introduction

Data mining (DM) is an interdisciplinary sub-field of computer science that is closely related to many different areas such as artificial intelligence (AI), machine learning, database systems, computer algorithms and statistics. This technology is widely employed in processes such as problem solving, financial data analysis, telecommunication industry, bio-informatics, learning, and other scientific applications (Pierce et al, 2008;Aydin et.al,2008; Sayer, Pierce & Fox, 2005; Aktas et.al,2006; Aktas et.al, 2005), which provides different approaches and methods. DM is an automated process for discovering patterns, finding association rules, detection of different anomalies structures on large databases.

Clustering analysis aims at creating clusters with the data or objects related to research in question by clustering them based on their similarities. While each created cluster contains objects with maximum similarities, they are least similar compared to data in other clusters. The quality of a clustering method depends on the level of compliance with this rule of thumb. Moreover, clustering approach is chosen based on the type of subject data and the goal of applications.

The main objective of this study is to investigate relationship between spatial and non-spatial data. Seismic Earthquake data of United States Geological Survey (USGS) (Aktas et.al, June 2005) that has multiple factors is chosen to find out this relationship and to graphically represent it. Primarily, normalization and other steps are applied, and then non-spatial data is determined using data mining methods on earthquake data.

After normalization process;

  • 1.

    Density based clustering is conducted and the results are monitored. DBSCAN (Density Based Spatial Clustering of Applications with Noise) algorithm is applied (United States Geological Survey (USGS), 2015; R language).

  • 2.

    Dendrogram is created by conducting hierarchical clustering operation. Agglomerative Nesting (AGNES) algorithm is applied (see reference DBSCAN algorithm R packet).

  • 3.

    k-Means clustering algorithm which aims at grouping the data in k-groups is employed.

  • 4.

    A method is developed and implemented in R programming language in order to correlate non-spatial features with spatial ones based on the results of clustering algorithms and the data is drawn on the world map in respect to its latitude and longitude, which provides graphical representation of the effects of clustering distribution on the map.

The results of these studies provides a way for comparison of clustering algorithms. Based on the conducted comparisons, it is monitored that whether or not similar features exist for different regions. R Studio platform, providing tools for both statistical calculations and high level graphical language, is used in application development phase. This programing language also provides interface and facilities for eliminating the errors for other high-level programming languages.

The remainder of this article is organized as follows. “Related works” section presents the related work. The proposed framework is given in “Architecture” section. Used algorithms are presented in “Clustering algorithms” section. The performance results and their analyses is given in “Performance tests and evaluation” section. The last section concludes the article.

Complete Chapter List

Search this Book:
Reset