Clustering Based Sampling for Learning from Unbalanced Seismic Data Set

Clustering Based Sampling for Learning from Unbalanced Seismic Data Set

Mohamed Elhadi Rahmani (GeCoDe Laboratory, Department of Computer Science, University of Dr. Tahar Moulay, Saida, Algeria), Abdelmalek Amine (GeCoDe Laboratory, Department of Computer Science, University of Dr. Tahar Moulay, Saida, Algeria) and Reda Mohamed Hamou (GeCoDe Laboratory, Department of Computer Science, University of Dr. Tahar Moulay, Saida, Algeria)
Copyright: © 2017 |Pages: 22
DOI: 10.4018/IJGEE.2017070101
OnDemand PDF Download:
List Price: $37.50


This article describes how some stratum contain a stress concentration zones, and while the stress increases and exceeds a high value or so called critical value, it destroys rocks. This causes the emission of seismic tremors of different energies. Seismology consists of the study of the effects of seismic waves, and predicting the seismic hazards to rocks and long wall coals. This is alongside the main problem occurred in this field, the unbalanced data that lacks cause when studying the seismic hazards. Learning from unbalanced data is considered as one of the most difficult issues to solve nowadays, this article presents an informed sampling method that is based on a clustering approach for the prediction of seismic hazards in Polish coal mines. The idea is based on the dividing of non-hazardous examples which represents more than 90% of the real-life cases into subsets of examples in order to balance the classes. This action facilitates the learning from the recorded data. For evaluation, the authors have evaluated the system based on the prediction of seismic hazards where positive results have been reviewed compared to the classification of examples without balancing the cases.
Article Preview


In our daily life, we are surrounded by a community of living organisms in conjunction with the nonliving components. These last keeps amazing us by many facts, either the ones that help for surviving, or disasters that threaten the ecosystem on earth. One of the most disaster that threaten human life and causes a huge damage is earthquakes.

Mining hazards is a subfield of mining activities connected to the dangers. They are the causes of disasters and accidents; mining hazards plays an important role in shaping industrial safety in coal mines. Similar to an earthquake, detection and prediction of seismic hazards present the hardest issue of natural hazards detection. Seismic activity and seismic hazard in underground coal mines occur in case of specific structure of geological deposit and the way of exploitation of coal. The nature of these hazards is influenced by a large number of factors which causes a complex and insufficiently recognized relationships among them. One example of a situation, with a particularly strong intensity, occurs in the Upper Silesian Coal Basin where there are additional conditions connected with: multi-seam structure of deposit, consequences of the long history of exploitation of this area and complex surface infrastructure. In almost all mines of this area there are systems which detect and assess a current degree of seismic hazard (Kabiesz, 2006). Hazard of high-energy destructive tremor which may result in a rock burst is a particular case of one of the major studies of coal mine geophysical stations work. As a phenomenon related with mining seismicity, Rock bursts pose a serious hazard to miners and can destroy long walls and the equipment.

Data engineering and knowledge discovery appeared in the information era, where the explosion of data amount and its exponential growth in volume allowed to data mining techniques to take an important role in daily civilian life, including scientific data exploration. One of these applications is the classification of seismic hazards in coal mines, which is the aim of this work. Classification of seismic hazards in coal mines is divided on two main basic techniques that will be discussed in 2.1. Seismic hazards prediction does not mean only prediction of danger cases but also the normal cases, the problem here is problem of gaining of geologists’ confidence that seems difficult because of the amount of false alarms, this is simply a prediction of normal case as dangerous case, which means a loss of money and time in surveillance for this case by geologists.

The problem with real life cases such as seismic hazards is the infrequent apparition of some important cases compared to other cases. Taking seismic hazards for example, more than 90% of recorded data are presenting non-hazardous cases that means no danger while the main goal for researchers is prediction of dangerous cases. This inequivalent in class distribution causes difficulties in learning from the recorded data in order to develop an intelligent system that predicts the danger seismic in coal mines. Figure 1 shows the percentage of distribution of hazardous cases against non-hazardous ones.

Figure 1.

The percentage of non-hazardous against hazardous cases

In case like seismic data set, the probability of classifying a hazardous case as non-hazardous will be very high. Taking a simple example, classification of hazardous cases using k-Nearest Neighbors (KNN) algorithm, finding majority of neighbors belong to non-hazardous cases has more chances than finding them belong to hazardous cases. This problem can be found using any classification algorithm, which results a high error value for prediction of dangers in seismic in coal mines.

This paper gave an informed sampling approach for dealing with unbalanced seismic data set based on k-means clustering technique. The remainder of the paper is organized as following: section 2 showed a state of the art where there are details of all aspects touched in this work beginning with the two basic techniques for classification of seismic hazards in coal mines followed by some related works for seismic hazards detection, then the next subsection shows the most known solutions in literature for dealing with unbalanced data. Section 3 details the different steps of our proposed approach and the metrics used for evaluation of the approach, while section 4 shows the obtained results by the approach. And finally, major conclusions are given in section 5.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 8: 2 Issues (2017)
Volume 7: 2 Issues (2016)
Volume 6: 2 Issues (2015)
Volume 5: 2 Issues (2014)
Volume 4: 2 Issues (2013)
Volume 3: 2 Issues (2012)
Volume 2: 2 Issues (2011)
Volume 1: 2 Issues (2010)
View Complete Journal Contents Listing