A Local Approach and Comparison with Other Data Mining Approaches in Software Application

A Local Approach and Comparison with Other Data Mining Approaches in Software Application

QingE Wu (Zhengzhou University of Light Industry, China) and Weidong Yang (Fudan University, China)
DOI: 10.4018/978-1-5225-1884-6.ch001
OnDemand PDF Download:
List Price: $37.50


In order to complete an online, real-time and effective aging detection to software, this paper studies a local approach that is also called a fuzzy incomplete and a statistical data mining approaches, and gives their algorithm implementation in the software system fault diagnosis. The application comparison of the two data mining approaches with four classical data mining approaches in software system fault diagnosis is discussed. The performance of each approach is evaluated from the sensitivity, specificity, accuracy rate, error classified rate, missed classified rate, and run-time. An optimum approach is chosen from several approaches to do comparative study. On the data of 1020 samples, the operating results show that the fuzzy incomplete approach has the highest sensitivity, the forecast accuracy that are 96.13% and 94.71%, respectively, which is higher than those of other approaches. It has also the relatively less error classified rate is or so 4.12%, the least missed classified rate is or so 1.18%, and the least runtime is 0.35s, which all are less than those of the other approaches. After the performance, indices are all evaluated and synthesized, the results indicate the performance of the fuzzy incomplete approach is best. Moreover, from the test analysis known, the fuzzy incomplete approach has also some advantages, such as it has the faster detection speed, the lower storage capacity, and does not need any prior information in addition to data processing. These results indicate that the mining approach is more effective and feasible than the old data mining approaches in software aging detection.
Chapter Preview


Because of the rapid increase of measurement data in engineering application and the participation of human, the uncertainty of information in data is more prominent, and the relationship among data is more complex. How to mine some potential and useful information from plentiful, fuzzy, disorderly and unsystematic, strong interferential data, so as to perform real-time and effective engineering applications, this is a problem needs to be urgently further study.

Data mining is a process of selection, exploration and modeling to a mass of data for discovering beforehand unknown rules and relations, whose purpose is to get some clear and useful results for the owner of the database (Giudici et al., 2004).

The spread speed of data mining was very fast, and its application scope was widespread day by day (Giudici et al., 2004, Liang 2006, Zhang et al., 2008, Hu et al., 2008, Liao and Yang, 2009, Chen et al., 2008). The literatures provided several data mining algorithms and some applications in engineering, and introduced three data mining algorithms in medicine applications. However, the data mining industry was still in the initial stage of development in China, the domestic industries basically didn’t have their own data mining systems.

In 1989 (Arai, 1989), at the 11th International symposium on Artificial Intelligence, scholars first proposed the conception of knowledge discovery in database (KDD). At the United States’ annual meeting on Computer in 1995, some scholars began to regard data mining as a fundamental step in knowledge discovery in databases, or discussed the two as synonyms.

Now, some algorithms on data mining have been relatively mature (Arai, 1989), (Farzanyar, Kangavari et al., 2012), (Qiu and Tamhane, 2007), (Wolff, Bhaduri et al., 2009), (Balzano and Del Sorbo, 2007), (Alp, Büyükbebeci et al., 2011). The decision Tree algorithm based on CHAID, some rules generated by Scenario could be applied to the unclassified data set to predict which records would have promising results. Scenario’s decision tree algorithm is very flexible, which gives the user the choice to split any variable, or the choice of splitting with statistical significance. He carried out the graphical analysis to the crude data by using the fold line chart, histogram and scatter plot. Liang Xun listed several main software developers on data mining (Liang, 2006).

This paper introduces two new approaches on data mining, uses them and other four classical supervised learning data mining technologies to learn and classify 1020 data, validates the feasibility and effectiveness for the new data mining approaches, and compares the performance of each approach with each other, so as to hope that can select an optimum mining approach for fault diagnosis in software system. The neural network (NN), support vector machine (SVM), decision tree and logistic regression are the best approaches to depict the nonlinearity of data in the data mining, moreover, the fuzzy incomplete and statistical approaches can also depict the nonlinearity of data, so they are very suitable for the characteristic of data of fault diagnosis in software system. This paper evaluates the performance of each approach from sensitivity, specificity, accuracy, error classified rate, missed classified rate, respectively, and also records the running time on the Pentium 4, 2.66GHz, 1GB memory machine, uses the 6 indexes as standards to evaluate the advantages and disadvantages of each approach, and selects an approach with optimal performance from these approaches as the approach of fault diagnosis in software system.

Complete Chapter List

Search this Book: