Experimental Study I: Automobile Dataset

Experimental Study I: Automobile Dataset

DOI: 10.4018/978-1-5225-5029-7.ch004

Abstract

This chapter provides implementation of the proposed model on Automobile data set. The chapter includes the implementation of pattern extraction from this dataset by following a series of steps discussed in the proposed model chapter. It also includes detailed implementation of pattern prediction from Automobile dataset for prediction of numeric variables, nominal variables, and aggregate data. The implementation of pattern prediction is also a series of steps as discussed before.
Chapter Preview
Top

4.1 Dataset Introduction

This case study involves implementation of model using Automobile dataset which is available at UCI machine learning repository(Asuncion & Newman, 2007). This dataset is a mixture of numeric and nominal variables and contains 26 variables. There are 11 nominal (categorical) variables and 15 numeric variables. Nominal variables, along with their respective distinct values are given in Table 1. Numeric variables include Wheel Base, Length, Width, Height, Curb Weight, Engine Type, Engine Size, Bore, Stroke, Compression Ratio, Horse Power, Peak RPM, City MPG, Highway MPG and Price. This standard dataset describes the characteristics of an automobile. More details of the dataset are available at UCI machine learning website.

4.1.1 Generate Hierarchical Clusters

In the first step of the model, Agglomerative Hierarchical Clustering is applied to all data based on numerical variables to generate clusters at different levels in the hierarchy. A detailed discussion of this step has already been provided in the proposed model chapter. This step produces clusters which are numbered manually at each level and presented using a tree structure in Figure 1.

Table 1.
Nominal variables along with their distinct values from Automobile Dataset
Nominal VariablesDistinct Values
Makealfa-romero, audi, bmw, chevrolet, dodge, honda, isuzu, jaguar, mazda, mercedes-benz, mercury, mitsubishi, nissan, peugot, plymouth, porsche, renault, saab, subaru, toyota, volkswagen, volvo
Fuel Typediesel, gas
Aspirationstd, turbo
Number of doorsfour, two
Body Stylehardtop, wagon, sedan, hatchback, convertible
Drive Wheels4wd, fwd, rwd
Engine Locationfront, rear
Engine Typedohc, dohcv, l, ohc, ohcf, ohcv, rotor
No. of Cylinderseight, five, four, six, three, twelve, two
Fuel System1bbl, 2bbl, 4bbl, idi, mfi, mpfi, spdi, spfi
Symboling-3, -2, -1, 0, 1, 2, 3
Figure 1.

Tree structure of hierarchical clusters

It is important to note that there are different variables involved in split of the clusters at each level. Moreover, each variable in every cluster will have a different variance than in the other cluster at same level. If all numeric variables Wheel Base, Length, Width, Height, Curb Weight, Engine Type, Engine Size, Bore, Stroke, Compression Ratio, Horse Power, Peak RPM, City MPG, Highway MPG and Price were used to define the split of C1 into C11 and C12, then a variable having an impact on split will have greater variance in one of the child clusters and lesser in the other cluster. At this stage all data in the example dataset is available in different clusters are different levels.

Complete Chapter List

Search this Book:
Reset