Article Preview
Top1. Introduction
Various studies used statistical methods (Savolainen et al., 2010; Karlaftis & Tarko, 1998; Jones et al., 1991; Poch & Mannering, 1996; Maher & Summersgill, 1996) and data mining techniques (Kumar & Toshniwal, 2015a, 2016b, 2016c; Chang & Chen, 2005; Kashani et al., 2011; Prayag et al., 2017) to analyze road accident data and establishing relationships between accident attributes and road accident severity. The results obtained from these studies are very useful as different factors affecting road accidents are revealed. Awareness of these accident factors is certainly helpful in taking preventive measures to overcome the accident rates in the area of study. However, it is also true that accident factors have different impact on different locations. Therefore, analyzing new road accident data certainly produces some new information about road accident factors affecting accident severity in those locations.
Various previous studies (Depaire et al., 2008; Ona et al., 2013; Kumar & Toshniwal, 2016d) mentioned that road accident data is usually affected with heterogeneity issue. In the presence of heterogeneity, it is rather difficult to establish relationship between certain attribute values or factors with a particular type of road accident, accident severity or accident locations. Therefore, it is very much required to remove this heterogeneity from the data so that useful results can be extracted from the data.
Kumar and Toshniwal (2015a) proposed a framework to remove the heterogeneity from the road accident data and suggested that clustering prior to analysis is very useful to deal with heterogeneity of road accident data. Ona et al. (2013) used latent class clustering (LCC) technique to remove heterogeneity from the data. They suggested that LCC is very useful clustering technique and also provides different cluster selection criteria to be used for identifying number of clusters present in the data set. Further, (Kumar & Toshniwal, 2016d) performed a comparative study on road accident data from Haridwar, Uttarakhand, India. In this study, they used LCC and K-modes (Chaturvedi et al., 2001; Kumar & Toshniwal, 2015b) clustering techniques to cluster the data prior to perform analysis. Further, they extracted association rules using Frequent Pattern (FP) growth technique to extract the rules that described accident pattern in each cluster. They concluded that both techniques have similar efficiency on cluster formation and are able to remove the heterogeneity from the data. However, there findings were not able to demonstrate the superiority of one technique over other.