V. Suresh Babu (Indian Institute of Technology-Guwahati, India), P. Viswanath (Indian Institute of Technology-Guwahati, India) and Narasimha M. Murty (Indian Institute of Science, India)

Copyright: © 2009
|Pages: 6

DOI: 10.4018/978-1-60566-010-3.ch260

Chapter Preview

TopNon-parametric methods like the nearest neighbor classifier (NNC) and the Parzen-Window based density estimation (Duda, Hart & Stork, 2000) are more general than parametric methods because they do not make any assumptions regarding the probability distribution form. Further, they show good performance in practice with large data sets. These methods, either explicitly or implicitly estimates the probability density at a given point in a feature space by counting the number of points that fall in a small region around the given point. Popular classifiers which use this approach are the NNC and its variants like the k-nearest neighbor classifier (k-NNC) (Duda, Hart & Stock, 2000). Whereas the DBSCAN is a popular density based clustering method (Han & Kamber, 2001) which uses this approach. These methods show good performance, especially with larger data sets. Asymptotic error rate of NNC is less than twice the Bayes error (Cover & Hart, 1967) and DBSCAN can find arbitrary shaped clusters along with noisy outlier detection (Ester, Kriegel & Xu, 1996).

The most prominent difficulty in applying the non-parametric methods for large data sets is its computational burden. The space and classification time complexities of NNC and k-NNC are *O(n)* where *n* is the training set size. The time complexity of DBSCAN is *O(n ^{2})*. So, these methods are not scalable for large data sets. Some of the remedies to reduce this burden are as follows. (1) Reduce the training set size by some editing techniques in order to eliminate some of the training patterns which are redundant in some sense (Dasarathy, 1991). For example, the condensed NNC (Hart, 1968) is of this type. (2) Use only a few selected prototypes from the data set. For example,

Using a few selected prototypes can reduce the computational burden. Prototypes can be derived by employing a clustering method like the leaders method (Spath, 1980)**,** the *k-*means method (Jain, Dubes, & Chen, 1987), *etc.,* which can find a partition of the data set where each block (cluster) of the partition is represented by a prototype called leader, centroid, *etc*. But these prototypes can not be used to estimate the probability density, since the density information present in the data set is lost while deriving the prototypes. The chapter proposes to use a modified leader clustering method called the *counted-leader* method which along with deriving the leaders preserves the crucial density information in the form of a *count* which can be used in estimating the densities. The chapter presents a fast and efficient nearest prototype based classifier called the *counted k-nearest leader classifier (ck-NLC)* which is on-par with the conventional k-NNC, but is considerably faster than the k-NNC. The chapter also presents a density based clustering method called *l*-DBSCAN which is shown to be a faster and scalable version of DBSCAN (Viswanath & Rajwala, 2006). Formally, under some assumptions, it is shown that the number of leaders is upper-bounded by a constant which is independent of the data set size and the distribution from which the data set is drawn.

Search this Book:

Reset

Editorial Advisory Board

Contents by Volume

Contents by Topic

Foreword

Alexander Tuzhilin

Preface

Acknowledgment

John Wang

$37.50

$37.50

Chapter 3

Xueping Li

$37.50

$37.50

$37.50

Chapter 6

Chun-Che Huang, Tzu-Liang ("Bill") Tseng

$37.50

$37.50

$37.50

Chapter 9

Applications of Kernel Methods
(pages 51-57)

Gustavo Camps-Valls, Manel Martínez-Ramón, José Luis Rojo-Álvarez

$37.50

Chapter 10

Architecture for Symbolic Object Warehouse
(pages 58-65)

Sandra Elizabeth González Císaro

$37.50

Chapter 11

Association Bundle Identification
(pages 66-70)

Wenxue Huang, Milorad Krneta, Limin Lin, Jianhong Wu

$37.50

$37.50

$37.50

$37.50

$37.50

Chapter 16

Association Rules and Statistics
(pages 94-97)

Martine Cadot, Jean-Baptiste Maj, Tarek Ziadé

$37.50

$37.50

$37.50

$37.50

Chapter 20

Automatic Genre-Specific Text Classification
(pages 120-127)

Xiaoyan Yu, Manas Tungare, Weiguo Fan, Manuel Pérez-Quiñones, Edward A. Fox, William Cameron, Lillian Cassel

$37.50

$37.50

Chapter 22

A Bayesian Based Machine Learning Application to Task Analysis
(pages 133-139)

Shu-Chiang Lin

$37.50

$37.50

$37.50

$37.50

Chapter 26

Bioinformatics and Computational Biology
(pages 160-165)

Gustavo Camps-Valls, Alistair Morgan Chalk

$37.50

Chapter 27

Biological Image Analysis via Matrix Approximation
(pages 166-170)

Jieping Ye, Ravi Janardan, Sudhir Kumar

$37.50

$37.50

Chapter 29

Lei Tang, Huan Liu, Jiangping Zhang

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

Chapter 37

Cluster Analysis with General Latent Class Model
(pages 225-230)

Dingxi Qiu, Edward C. Malthouse

$37.50

$37.50

Chapter 39

Clustering Analysis of Data with High Dimensionality
(pages 237-245)

Athman Bouguettaya, Qi Yu

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

Chapter 47

Conceptual Modeling for Data Warehouse and OLAP Applications
(pages 293-300)

Elzbieta Malinowski, Esteban Zimányi

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

Chapter 53

Control-Based Database Tuning Under Dynamic Workloads
(pages 333-338)

Yi-Cheng Tu, Gang Ding

$37.50

$37.50

Chapter 55

Count Models for Software Quality Estimation
(pages 346-352)

Kehan Gao, Taghi M. Khoshgoftaar

$37.50

$37.50

Chapter 57

Data Confidentiality and Chase-Based Knowledge Discovery
(pages 361-366)

Seunghyun Im, Zbigniew W. Ras

$37.50

$37.50

Chapter 59

A Data Distribution View of Clustering Algorithms
(pages 374-381)

Junjie Wu, Jian Chen, Hui Xiong

$37.50

$37.50

$37.50

$37.50

Chapter 63

Data Mining Applications in Steel Industry
(pages 400-405)

Joaquín Ordieres-Meré, Manuel Castejón-Limas, Ana González-Marcos

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

Chapter 70

Data Mining for Obtaining Secure E-Mail Communications
(pages 445-449)

Mª Dolores del Castillo

$37.50

Chapter 71

Data Mining for Structural Health Monitoring
(pages 450-457)

Ramdev Kanapady, Aleksandar Lazarevic

$37.50

Chapter 72

Data Mining for the Chemical Process Industry
(pages 458-464)

Ng Yew Seng, Rajagopalan Srinivasan

$37.50

$37.50

Chapter 74

Haipeng Wang

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

Chapter 97

Data-Driven Revision of Decision Models
(pages 617-623)

Martin Žnidaršic, Marko Bohanec, Blaž Zupan

$37.50

$37.50

$37.50

$37.50

Chapter 101

Direction-Aware Proximity on Graphs
(pages 646-653)

Hanghang Tong, Yehuda Koren, Christos Faloutsos

$37.50

$37.50

$37.50

$37.50

$37.50

Chapter 106

Discovery of Protein Interaction Sites
(pages 683-688)

Haiquan Li, Jinyan Li, Xuechun Zhao

$37.50

$37.50

$37.50

Chapter 109

Yu Chen, Wei-Shinn Ku

$37.50

$37.50

$37.50

$37.50

Chapter 113

Dynamical Feature Extraction from Brain Activity Time Series
(pages 729-735)

Chang-Chia Liu, W. Art Chaovalitwongse, Panos M. Pardalos, Basim M. Uthman

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

Chapter 123

Ivan Bruha

$37.50

Chapter 124

The Evolution of SDI Geospatial Data Clearinghouses
(pages 802-809)

Caitlin Kelly Maurie

$37.50

Chapter 125

Evolutionary Approach to Dimensionality Reduction
(pages 810-816)

Amit Saxena, Megha Kothari, Navneet Pandey

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

Chapter 131

Elzbieta Malinowski, Esteban Zimányi

$37.50

$37.50

Chapter 133

Feature Extraction/Selection in High-Dimensional Spectral Data
(pages 863-869)

Seoung Bum Kim

$37.50

Chapter 134

Feature Reduction for Support Vector Machines
(pages 870-877)

Shouxian Cheng, Frank Y. Shih

$37.50

$37.50

$37.50

$37.50

$37.50

Chapter 139

Frequent Sets Mining in Data Stream Environments
(pages 901-906)

Xuan Hong Dang, Wee-Keong Ng, Kok-Leong Ong, Vincent Lee

$37.50

$37.50

$37.50

Chapter 142

A Genetic Algorithm for Selecting Horizontal Fragments
(pages 920-925)

Ladjel Bellatreche

$37.50

$37.50

Chapter 144

Alex A. Freitas, Gisele L. Pappa

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

Chapter 150

Hierarchical Document Clustering
(pages 970-975)

Benjamin C.M. Fung, Ke Wang, Martin Ester

$37.50

$37.50

$37.50

$37.50

Chapter 154

Hybrid Genetic Algorithms in Data Mining Applications
(pages 993-998)

Sancho Salcedo-Sanz, Gustavo Camps-Valls, Carlos Bousoño-Calzón

$37.50

Chapter 155

Imprecise Data and the Data Mining Process
(pages 999-1005)

Marvin L. Brown, John F. Kros

$37.50

$37.50

$37.50

$37.50

Chapter 159

Information Fusion for Scientific Literature Classification
(pages 1023-1033)

Gary G. Yen

$37.50

Chapter 160

Information Veins and Resampling with Rough Set Theory
(pages 1034-1040)

Benjamin Griffiths

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

Chapter 170

An Introduction to Kernel Methods
(pages 1097-1101)

Gustavo Camps-Valls, Manel Martínez-Ramón, José Luis Rojo-Álvarez

$37.50

$37.50

Chapter 172

Knowledge Acquisition from Semantically Heterogeneous Data
(pages 1110-1116)

Doina Caragea, Vasant Honavar

$37.50

Chapter 173

Knowledge Discovery in Databases with Diversity of Data Types
(pages 1117-1123)

QingXiang Wu, Martin McGinnity, Girijesh Prasad, David Bell

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

Chapter 180

Legal and Technical Issues of Privacy Preservation in Data Mining
(pages 1158-1163)

Kirsten Wahlstrom, John F. Roddick, Rick Sarre, Vladimir Estivill-Castro, Denise de Vries

$37.50

Chapter 181

Leveraging Unlabeled Data for Classification
(pages 1164-1169)

Yinghui Yang, Balaji Padmanabhan

$37.50

Chapter 182

Locally Adaptive Techniques for Pattern Classification
(pages 1170-1175)

Carlotta Domeniconi, Dimitrios Gunopulos

$37.50

Chapter 183

Mass Informatics in Differential Proteomics
(pages 1176-1181)

Xiang Zhang, Seza Orcun, Mourad Ouzzani, Cheolhwan Oh

$37.50

Chapter 184

Materialized View Selection for Data Warehouse Design
(pages 1182-1187)

Dimitri Theodoratos, Wugang Xu, Alkis Simitsis

$37.50

Chapter 185

Matrix Decomposition Techniques for Data Privacy
(pages 1188-1193)

Jun Zhang, Jie Wang, Shuting Xu

$37.50

Chapter 186

Measuring the Interestingness of News Articles
(pages 1194-1199)

Raymond K. Pon, Alfonso F. Cardenas, David J. Buttler

$37.50

$37.50

Chapter 188

Meta-Learning
(pages 1207-1215)

Christophe Giraud-Carrier, Pavel Brazdil, Carlos Soares, Ricardo Vilalta

$37.50

$37.50

$37.50

$37.50

Chapter 192

Mining 3D Shape Data for Morphometric Pattern Discovery
(pages 1236-1242)

Li Shen, Fillia Makedon

$37.50

Chapter 193

Mining Chat Discussions
(pages 1243-1247)

Stanley Loh Daniel Licthnow, Thyago Borges Tiago Primo

$37.50

$37.50

$37.50

$37.50

Chapter 197

Mining Generalized Association Rules in an Evolving Environment
(pages 1268-1274)

Wen-Yang Lin, Ming-Cheng Tseng

$37.50

Chapter 198

Mining Generalized Web Data for Discovering Usage Patterns
(pages 1275-1281)

Doru Tanasa

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

Chapter 208

A Multi-Agent System for Handling Adaptive E-Services
(pages 1346-1351)

Pasquale De Meo, Giovanni Quattrone, Giorgio Terracina, Domenico Ursino

$37.50

$37.50

Chapter 210

Multidimensional Modeling of Complex Data
(pages 1358-1364)

Omar Boussaid, Doulkifli Boukraa

$37.50

$37.50

Chapter 212

Multi-Instance Learning with MultiObjective Genetic Programming
(pages 1372-1379)

Amelia Zafra

$37.50

$37.50

Chapter 214

Multiple Criteria Optimization in Data Mining
(pages 1386-1389)

Gang Kou, Yi Peng, Yong Shi

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

Chapter 222

OLAP Visualization: Models, Issues, and Techniques
(pages 1439-1446)

Alfredo Cuzzocrea, Svetlana Mansmann

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

Chapter 229

Path Mining and Process Mining for Workflow Management Systems
(pages 1489-1496)

Jorge Cardoso, W.M.P. van der Aalst

$37.50

Chapter 230

Pattern Discovery as Event Association
(pages 1497-1504)

Andrew K.C. Wong, Yang Wang, Gary C.L. Li

$37.50

Chapter 231

Pattern Preserving Clustering
(pages 1505-1510)

Hui Xiong, Michael Steinbach, Pang-Ning Tan, Vipin Kumar, Wenjun Zhou

$37.50

Chapter 232

Pattern Synthesis for Nonparametric Pattern Recognition
(pages 1511-1516)

P. Viswanath, Narasimha M. Murty, Bhatnagar Shalabh

$37.50

$37.50

Chapter 234

The Personal Name Problem and a Data Mining Solution
(pages 1524-1531)

Clifton Phua, Vincent Lee, Kate Smith-Miles

$37.50

Chapter 235

Perspectives and Key Technologies of Semantic Web Search
(pages 1532-1537)

Konstantinos Kotis

$37.50

Chapter 236

A Philosophical Perspective on Knowledge Creation
(pages 1538-1545)

Nilmini Wickramasinghe, Rajeev K. Bali

$37.50

$37.50

$37.50

Chapter 239

Predicting Resource Usage for Capital Efficient Marketing
(pages 1558-1569)

D. R. Mani, Andrew L. Betz, James H. Drew

$37.50

$37.50

Chapter 241

Privacy Preserving OLAP and OLAP Security
(pages 1575-1581)

Alfredo Cuzzocrea, Vincenzo Russo

$37.50

$37.50

Chapter 243

Process Mining to Analyze the Behaviour of Specific Users
(pages 1589-1597)

Laura Maruster

$37.50

$37.50

$37.50

$37.50

Chapter 247

Projected Clustering for Biological Data Analysis
(pages 1617-1622)

Ping Deng, Qingkai Ma, Weili Wu

$37.50

Chapter 248

Proximity-Graph-Based Tools for DNA Clustering
(pages 1623-1631)

Imad Khoury, Godfried Toussaint, Antonio Ciampi, Isadora Antoniano

$37.50

Chapter 249

Pseudo-Independent Models and Decision Theoretic Knowledge Discovery
(pages 1632-1638)

Yang Xiang

$37.50

$37.50

Chapter 251

Quantization of Continuous Data for Pattern Based Rule Extraction
(pages 1646-1652)

Andrew Hamilton-Wright, Daniel W. Stashuk

$37.50

Chapter 252

Realistic Data for Testing Rule Mining Algorithms
(pages 1653-1658)

Colin Cooper, Michele Zito

$37.50

Chapter 253

Real-Time Face Detection and Classification for ICCTV
(pages 1659-1666)

Brian C. Lovell, Shaokang Chen, Ting Shan

$37.50

$37.50

$37.50

$37.50

Chapter 257

Robust Face Recognition for Data Mining
(pages 1689-1695)

Brian C. Lovell, Shaokang Chen, Ting Shan

$37.50

$37.50

$37.50

Chapter 260

Scalable Non-Parametric Methods for Large Data Sets
(pages 1708-1713)

V. Suresh Babu, P. Viswanath, Narasimha M. Murty

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

Chapter 267

Segmentation of Time Series Data
(pages 1753-1758)

Parvathi Chundi, Daniel J. Rosenkrantz

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

Chapter 274

Sequential Pattern Mining
(pages 1800-1805)

Florent Masseglia, Maguelonne Teisseire, Pascal Poncelet

$37.50

Chapter 275

Soft Computing for XML Data Mining
(pages 1806-1809)

K. G. Srinivasa, K. R. Venugopal, L. M. Patnaik

$37.50

Chapter 276

Soft Subspace Clustering for High-Dimensional Data
(pages 1810-1814)

Liping Jing, Michael K. Ng, Joshua Zhexue Huang

$37.50

Chapter 277

Spatio-Temporal Data Mining for Air Pollution Problems
(pages 1815-1822)

Seoung Bum Kim, Chivalai Temiyasathit, Sun-Kyoung Park, Victoria C.P. Chen

$37.50

$37.50

Chapter 279

Stages of Knowledge Discovery in E-Commerce Sites
(pages 1830-1834)

Christophe Giraud-Carrier

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50

$37.50