This chapter considers the role of fuzzy decision trees as a tool for intelligent data analysis in domestic travel research. It demonstrates the readability and interpretability the findings from fuzzy decision tree analysis can pertain, first presented in a small problem allowing the fullest opportunity for the analysis to be followed. The investigation of the traffic fatalities in the states of the US offers an example of a more comprehensive fuzzy decision tree analysis. The graphical representations of the fuzzy based membership functions show how the necessary linguistic terms are defined. The final fuzzy decision trees, both tutorial and US traffic fatalities based, show the structured form the analysis offers, as well as more readable decision rules contained therein.
In a wide discussion on the issue of data analysis, Breiman (2001) advocates the need for the development of new techniques, suggesting that the interpretation of results has an equal if not more important role to play than the simple predictive accuracy often only identified. Beyond the statistical inference from traditional regression type analyses of data, for many researchers, the set-up costs necessary to understand novel techniques can dissuade them from their employment. Domestic travel research is one such area that can benefit from the interpretable results accrued, since policy making is often the central desire of the study undertaken (see for example, Ewing, 2003; Noland, 2003; Shemer, 2004).
This self-induced doubt may be true with the possible employment of nascent techniques based on uncertain reasoning (Chen, 2001), which in one way or another, attempt to take into account the possible imperfection and/or relevance of the data to be studied. These imperfections include the imprecision of individual data values and in the more extreme case when a number of them are missing (incompleteness). One associated general methodology, fuzzy set theory (FST), introduced in Zadeh (1965), is closely associated with uncertain reasoning (Zadeh, 2005), including the opportunities to develop traditional techniques so that they incorporate vagueness and ambiguity in their operations. Within this fuzzy environment, data analysis is also extended to allow a linguistic facet to the possible interpretation of results.
In this chapter the technical emphasis is in the general area of decision trees within a fuzzy environment, a technique for the classification of objects described by a number of attributes. Armand, Watelain, Roux, Mercier, and Lepoutre (2007, p. 476) present a recent, succinct, description of what fuzzy decision trees offer;
“Fuzzy decision trees (FDT) incorporate a notion of fuzziness that permits inaccuracy and uncertainty to be introduced and allow the phenomena under consideration to be expressed using natural language.”
Their application in gait analysis they believe benefits from the allowance for imprecision and interpretability. Pertinent for this edited book, Wang, Nauck, Spott, and Kruse (2007) consider fuzzy decision trees in relation to intelligent data analysis, motivation for their study was the belief that typical business users prefer softwares, which hide complexity from users and automate the data analysis process. There is a further implication when using fuzzy decision trees, namely that it inherently includes feature selection (Mikut, Jäkel, & Gröll, 2005), whereby small subsets of features are found with high-discriminating power.
The fuzzy approach employed here was presented in Yuan and Shaw (1995) and Wang, Chen, Qian, and Ye (2000), and attempts to include the cognitive uncertainties evident in the imprecision inherent with the data values. This is, notably, through the construction of fuzzy membership functions (MFs), which enable levels of association to the linguistic variable representation of the numerical attributes considered (Kecman, 2001).
The problem considered in this study concerns road travel in the US, namely the discernment of the levels of traffic fatalities across the individual states. This issue has attracted much attention (see Noland, 2003), one reason being that these accidents account for a significant proportion of premature fatalities in the US (and most other developed countries for that matter, see Shemer, 2004). As such they have been the focus of much attention in many fields of scientific study, from accident prevention to economic, social and behavioural analysis (Zobeck, Grant, Stinson, & Bettolucci 1994; Washington, Metarko, Fomumung, Ross, Julian, & Moran, 1999; Farmer & Williams, 2005).