A Disk-Based Algorithm for Fast Outlier Detection in Large Datasets

A Disk-Based Algorithm for Fast Outlier Detection in Large Datasets

Faxin Zhao (Northeastern University, China and Tonghua Teachers College, China), Yubin Bao (Northeastern University, China), Huanliang Sun (Northeastern University, China) and Ge Yu (Northeastern University, China)
Copyright: © 2007 |Pages: 15
DOI: 10.4018/978-1-59904-120-9.ch002
OnDemand PDF Download:
$37.50

Abstract

In data mining fields, outlier detection is an important research issue. The number of cells in the cell-based disk algorithm increases exponentially. The performance of this algorithm will decrease dramatically with the increasing of the number of cells and data points. Through further analysis, we find that there are many empty cells that are useless to outlier detection. So this chapter proposes a novel index structure, called CD-Tree, in which only non-empty cells are stored, and a cluster technique is adopted to store the data objects in the same cell into linked disk pages. Some experiments are made to test the performance of the proposed algorithms. The experimental results show that the performance of the CD-Tree structure and of the cluster technique based disk algorithm outperforms that of the cell-based disk algorithm, and the dimensionality processed by the proposed algorithm is higher than that of the old one.

Complete Chapter List

Search this Book:
Reset
Table of Contents
Acknowledgments
Zongmin Ma
Chapter 1
Ioannis N. Kouris, Christos H. Makris, Athanasios K. Tsakalidis
Most algorithms and approaches dealing with data mining in general and especially those focusing on the task of association rule mining have assumed... Sample PDF
Uncovering Hidden Associations Through Negative Itemsets Correlations
$37.50
Chapter 2
Faxin Zhao, Yubin Bao, Huanliang Sun, Ge Yu
In data mining fields, outlier detection is an important research issue. The number of cells in the cell-based disk algorithm increases... Sample PDF
A Disk-Based Algorithm for Fast Outlier Detection in Large Datasets
$37.50
Chapter 3
Tzung-Pei Hong, Ching-Yao Wang
Developing an efficient mining algorithm that can incrementally maintain discovered information as a database grows is quite important in the field... Sample PDF
Maintenance of Association Rules Using Pre-Large Itemsets
$37.50
Chapter 4
Rosa Meo, Giuseppe Psaila
Inductive databases have been proposed as general purpose databases to support the KDD process. Unfortunately, the heterogeneity of the discovered... Sample PDF
An XML-Based Database for Knowledge Discovery: Definition and Implementation
$37.50
Chapter 5
Marcus Costa Sampaio, Cláudio de Souza Baptita, André Gomes de Sousa, Fabiana Ferreira do Nascimento
This chapter introduces spatial dimensions and measures as a means of enhancing decision support systems with spatial capabilities. By some way or... Sample PDF
Enhancing Decision Support Systems with Spatial Capabilities
$37.50
Chapter 6
S.A. Oke
This work demonstrates the application of decision tree, a data mining tool, in the manufacturing system. Data mining has the capability for... Sample PDF
Application of Decision Tree as a Data mining Tool in a Manufacturing System
$37.50
Chapter 7
Gian Piero Zarri
In this chapter, we evoke first the ubiquity and the importance of the so-called ‘narrative’ information, showing that the usual ontological tools... Sample PDF
An Implemented Representation and Reasoning Systems for Creating and Exploiting Large Knowledge Bases of Narrative Information
$37.50
Chapter 8
Z. M. Ma
Fuzzy set theory has been extensively applied to extend various data models and resulted in numerous contributions, mainly with respect to the... Sample PDF
A Literature Overview of Fuzzy Database Modeling
$37.50
Chapter 9
J. Gerard Wolff
This chapter describes some of the kinds of “intelligence” that may be exhibited by an intelligent database system based on the SP theory of... Sample PDF
Aspects of Intelligence in an "SP" Database System
$37.50
Chapter 10
Davide Martinenghi, Henning Christiansen, Hendrik Decker
Integrity constraints are a key tool for characterizing the well-formedness and semantics of the information contained in databases. In this regard... Sample PDF
Integrity Checking and Maintenance in Relational and Deductive Database and Beyond
$37.50
Chapter 11
Hassina Bounif
Information systems, including their core databases need to meet changing user requirements and adhere to evolving business strategies. Traditional... Sample PDF
Predicitive Approach for Database Schema Evolution
$37.50
About the Authors