Augmenting Classifiers Performance through Clustering: A Comparative Study on Road Accident Data

Augmenting Classifiers Performance through Clustering: A Comparative Study on Road Accident Data

Sachin Kumar (Graphic Era University, Dehradun, India), Prayag Tiwari (National University of Science and Technology MISIS, Moscow, Russia) and Kalitin Vladimirovich Denis (National University of Science and Technology MISIS, Moscow, Russia)
Copyright: © 2018 |Pages: 12
DOI: 10.4018/IJIRR.2018010104


Road and traffic accident data analysis are one of the prime interests in the present era. It does not only relate to the public health and safety concern but also associated with using latest techniques from different domains such as data mining, statistics, machine learning. Road and traffic accident data have different nature in comparison to other real-world data as road accidents are uncertain. In this article, the authors are comparing three different clustering techniques: latent class clustering (LCC), k-modes clustering and BIRCH clustering, on road accident data from an Indian district. Further, Naïve Bayes (NB), random forest (RF) and support vector machine (SVM) classification techniques are used to classify the data based on the severity of road accidents. The experiments validate that the LCC technique is more suitable to generate good clusters to achieve maximum classification accuracy.
Article Preview

1. Introduction

Various studies used statistical methods (Savolainen et al., 2010; Karlaftis & Tarko, 1998; Jones et al., 1991; Poch & Mannering, 1996; Maher & Summersgill, 1996) and data mining techniques (Kumar & Toshniwal, 2015a, 2016b, 2016c; Chang & Chen, 2005; Kashani et al., 2011; Prayag et al., 2017) to analyze road accident data and establishing relationships between accident attributes and road accident severity. The results obtained from these studies are very useful as different factors affecting road accidents are revealed. Awareness of these accident factors is certainly helpful in taking preventive measures to overcome the accident rates in the area of study. However, it is also true that accident factors have different impact on different locations. Therefore, analyzing new road accident data certainly produces some new information about road accident factors affecting accident severity in those locations.

Various previous studies (Depaire et al., 2008; Ona et al., 2013; Kumar & Toshniwal, 2016d) mentioned that road accident data is usually affected with heterogeneity issue. In the presence of heterogeneity, it is rather difficult to establish relationship between certain attribute values or factors with a particular type of road accident, accident severity or accident locations. Therefore, it is very much required to remove this heterogeneity from the data so that useful results can be extracted from the data.

Kumar and Toshniwal (2015a) proposed a framework to remove the heterogeneity from the road accident data and suggested that clustering prior to analysis is very useful to deal with heterogeneity of road accident data. Ona et al. (2013) used latent class clustering (LCC) technique to remove heterogeneity from the data. They suggested that LCC is very useful clustering technique and also provides different cluster selection criteria to be used for identifying number of clusters present in the data set. Further, (Kumar & Toshniwal, 2016d) performed a comparative study on road accident data from Haridwar, Uttarakhand, India. In this study, they used LCC and K-modes (Chaturvedi et al., 2001; Kumar & Toshniwal, 2015b) clustering techniques to cluster the data prior to perform analysis. Further, they extracted association rules using Frequent Pattern (FP) growth technique to extract the rules that described accident pattern in each cluster. They concluded that both techniques have similar efficiency on cluster formation and are able to remove the heterogeneity from the data. However, there findings were not able to demonstrate the superiority of one technique over other.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 10: 4 Issues (2020): 1 Released, 3 Forthcoming
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing