An Algebraic Approach to Data Quality Metrics for Entity Resolution over Large Datasets

An Algebraic Approach to Data Quality Metrics for Entity Resolution over Large Datasets

John Talburt (University of Arkansas at Little Rock, USA), Richard Wang (Massachusetts Institute of Technology, USA), Kimberly Hess (CASA 20th Judicial District, USA) and Emily Kuo (Massachusetts Institute of Technology, USA)
DOI: 10.4018/978-1-59904-951-9.ch196
OnDemand PDF Download:


This chapter introduces abstract algebra as a means of understanding and creating data quality metrics for entity resolution, the process in which records determined to represent the same real-world entity are successively located and merged. Entity resolution is a particular form of data mining that is foundational to a number of applications in both industry and government. Examples include commercial customer recognition systems and information sharing on “persons of interest” across federal intelligence agencies. Despite the importance of these applications, most of the data quality literature focuses on measuring the intrinsic quality of individual records than the quality of record grouping or integration. In this chapter, the authors describe current research into the creation and validation of quality metrics for entity resolution, primarily in the context of customer recognition systems. The approach is based on an algebraic view of the system as creating a partition of a set of entity records based on the indicative information for the entities in question. In this view, the relative quality of entity identification between two systems can be measured in terms of the similarity between the partitions they produce. The authors discuss the difficulty of applying statistical cluster analysis to this problem when the datasets are large and propose an alternative index suitable for these situations. They also report some preliminary experimental results, and outlines areas and approaches to further research in this area.

Complete Chapter List

Search this Book:
Editorial Advisory Board
Chapter 1
James E. Yao, Chang Liu, Qiyang Chen, June Lu
As internal and external demands on information from managers are increasing rapidly, especially the information that is processed to serve... Sample PDF
Administering and Managing a Data Warehouse
Chapter 2
Rick L. Wilson, Peter A. Rosen, Mohammad Saad Al-Ahmadi
Considerable research has been done in the recent past that compares the performance of different data mining techniques on various data sets (e.g.... Sample PDF
Knowledge Structure and Data Mining Techniques
Chapter 3
James E. Yao, Chang Liu, Qiyang Chen, June Lu
As internal and external demands on information from managers are increasing rapidly, especially the information that is processed to serve... Sample PDF
Administering and Managing a Data Warehouse
Chapter 4
Yong Shi, Yi Peng, Gang Kou, Zhengxin Chen
This chapter provides an overview of a series of multiple criteria optimization-based data mining methods, which utilize multiple criteria... Sample PDF
Introduction to Data Mining Techniques via Multiple Criteria Optimization Approaches and Applications
Chapter 5
Stanley R.M. Oliveira, Osmar R. Zaiane
Privacy-preserving data mining (PPDM) is one of the newest trends in privacy and security research. It is driven by one of the major policy issues... Sample PDF
Privacy-Preserving Data Mining on the Web: Foundations and Techniques
Chapter 6
Grigorios Tsoumakas, Ioannis Katakis
Multi-label classification methods are increasingly required by modern applications, such as protein function classification, music categorization... Sample PDF
Multi-Label Classification: An Overview
Chapter 7
Online Data Mining  (pages 75-83)
He´ctor Oscar Nigro, Sandra Elizabeth González Císaro
Several approaches for intelligent data analysis are not only available but also tried and tested. Online analytical processing (OLAP) and data... Sample PDF
Online Data Mining
Chapter 8
Nathaniel B. Noriel, Chew Lim Tan
The 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2006 Data Mining Competition involved the problem of classifying... Sample PDF
A Look back at the PAKDD Data Mining Competition 2006
Chapter 9
Hui-Huang Hsu
Bioinformatics uses information technologies to facilitate the discovery of new knowledge in molecular biology. Among the information technologies... Sample PDF
Introduction to Data Mining in Bioinformatics
Chapter 10
Tatsuya Akutsu
This chapter provides an overview of computational problems and techniques for protein threading. Protein threading is one of the most powerful... Sample PDF
Algorithmic Aspects of Protein Threading
Chapter 11
Alex Freitas, Andre´ C.P.L.F. de Carvalho
In machine learning and data mining, most of the works in classification problems deal with flat classification, where each instance is classified... Sample PDF
A Tutorial on Hierarchical Classification with Applications in Bioinformatics
Chapter 12
Jose D. Montero
This chapter provides a brief introduction to data mining, the data mining process, and its applications to manufacturing. Several examples are... Sample PDF
Introduction to Data Mining and its Applications to Manufacturing
Chapter 13
Data Warehousing and OLAP  (pages 169-178)
Jose Hernandez-Orallo
Information systems provide organizations with the necessary information to achieve their goals. Relevant information is gathered and stored to... Sample PDF
Data Warehousing and OLAP
Chapter 14
Prasad M. Deshpande, Karthikeyan Ramasamy
Since the advent of information technology, businesses have been collecting vast amounts of data about their daily transactions. For example, a... Sample PDF
Data Warehousing, Multi-Dimensional Data Models and OLAP
Chapter 15
Z.. M. Ma
Fuzzy set theory has been extensively applied to extend various data models and resulted in numerous contributions, mainly with respect to the... Sample PDF
A Literature Overview of Fuzzy Database Modeling
Chapter 16
Stefano Rizzi
In the context of data warehouse design, a basic role is played by conceptual modeling, that provides a higher level of abstraction in describing... Sample PDF
Conceptual Modeling Solutions for the Data Warehouse
Chapter 17
Irene Ntoutsi, Nikos Pelekis, Yannis Theodoridis
Many patterns are available nowadays due to the widespread use of knowledge discovery in databases (KDD), as a result of the overwhelming amount of... Sample PDF
Pattern Comparison in Data Mining: A Survey
Chapter 18
Marinette Bouet, Pierre Gançarski, Omar Boussaïd
Analysing and mining image data to derive potentially useful information is a very challenging task. Image mining concerns the extraction of... Sample PDF
Pattern Mining and Clustering on Image Databases
Chapter 19
Dinesh Batra
The tremendous demand for software productivity has led to the idea of reuse of solutions that have worked successfully in the past. The notion of a... Sample PDF
Conceptual Data Modeling Patterns: Representation and Validation
Chapter 20
Haorianto Cokrowijoyo Tjioe, David Taniar
Data mining applications have enormously altered the strategic decision-making processes of organizations. The application of association rules... Sample PDF
Mining Association Rules in Data Warehouses
Chapter 21
Olena Daly, David Taniar
Data mining is a process of discovering new, unexpected, valuable patterns from existing databases (Frawley, Piatetsky-Shapiro, & Matheus, 1991).... Sample PDF
Exception Rules in Data Mining
Chapter 22
Process-Based Data Mining  (pages 343-349)
Karim K. Hirji
In contrast to the Industrial Revolution, the Digital Revolution is happening much more quickly. For example, in 1946, the world’s first... Sample PDF
Process-Based Data Mining
Chapter 23
Andreas Koeller
Integration of data sources refers to the task of developing a common schema as well as data transformation solutions for a number of data sources... Sample PDF
Integration of Data Sources through Data Mining
Chapter 24
Nikunj C. Oza
Ensemble data mining methods, also known as committee methods or model combiners, are machine learning methods that leverage the power of multiple... Sample PDF
Ensemble Data Mining Methods
Chapter 25
Paolo Giudici
Several classes of computational and statistical methods for data mining are available. Each class can be parameterised so that models within the... Sample PDF
Evaluation of Data Mining Methods
Chapter 26
Takao Ito
One of the most important issues in data mining is to discover an implicit relationship between words in a large corpus and labels in a large... Sample PDF
Discovering an Effective Measure in Data Mining
Chapter 27
Neerja Sethi, Vijay Sethi
Internet companies are now in the second stage of evolution in which the emphasis is on building brands (Campman, 2001) and retaining customers... Sample PDF
Data Warehousing and Data Mining Lessons for EC Companies
Chapter 28
Les Pang
Data warehousing has been a successful approach for supporting the important concept of knowledge management — one of the keys to organizational... Sample PDF
Best Practices in Data Warehousing from the Federal Perspective
Chapter 29
Alexander Anisimov
This chapter is dedicated to the major managerial, organizational and technological aspects of development of data warehouses in a global... Sample PDF
Decision Support and Data Warehousing: Challenges of a Global Information Environment
Chapter 30
Manuel Serrano, Coral Calero, Mario Piattini
Data warehouses are large repositories that integrate data from several sources for analysis and decision support. Data warehouse quality is... Sample PDF
An Experimental Replication With Data Warehouse Metrics
Chapter 31
Juha Kontio
Reporting is one of the basic processes in all organizations. It provides information for planning and decision making and, on the other hand... Sample PDF
Data Warehousing Solutions for Reporting Problems
Chapter 32
Juan Manuel Dodero, Paloma Diaz, Ignacio Aedo
Knowledge creation or production in a distributed knowledge management system is a collaborative task that needs to be coordinated. A multi-agent... Sample PDF
A Multi-Agent Approach to Collaborate Knowledge Production
Chapter 33
Bernd Knobloch
This chapter introduces a framework for organizational data analysis suited for data-driven and hypotheses-driven problems. It shows why knowledge... Sample PDF
A Framework for Organizational Data Analysis and Organizational Data Mining
Chapter 34
David Camacho, Ricardo Aler, Juan Cuadrado
How to build intelligent robust applications that work with the information stored in the Web is a difficult problem for several reasons which arise... Sample PDF
Rule-Based Parsing for Web Data Extraction
Chapter 35
Vicky Nassis, R. Rajagopalapillai, Tharam S. Dillon, Wenny Rahayu
EXtensible Markup Language (XML) has emerged as the dominant standard in describing and exchanging data among heterogeneous data sources. The... Sample PDF
Conceptual and Systematic Design Approach for XML Document Warehouses
Chapter 36
Ji Zhang, Han Liu, Tok Wang Ling, Robert M. Bruckner, A. Min Tjoa
In this article, we propose a framework, called XAR-Miner, for mining ARs from XML documents efficiently. In XAR-Miner, raw data in the XML document... Sample PDF
A Framework for Efficient Association Rule Mining in XML Data
Chapter 37
Laura Irina Rusu, J. Wenny Rahayu, David Taniar
Developing a data warehouse for XML documents involves two major processes: one of creating it, by processing XML raw documents into a specified... Sample PDF
A Methodology for Building XML Data Warehouses
Chapter 38
Serg Luján-Mora, Juan Trujillo
In previous work, we have shown how to use unified modeling language (UML) as the primary representation mechanism to model conceptual design... Sample PDF
Applying UML for Modeling the Physical Design of Data Warehouses
Chapter 39
Serg Luján-Mora, Juan Trujillo
Several approaches have been proposed to model different aspects of a Data Warehouse (DW) during recent years, such as the modeling of a DW at the... Sample PDF
Physical Modeling of Data Warehouses Using UML Component and Deployment Diagrams: Design and Implementation Issues
Chapter 40
Lionel Savary, Georges Gardarin, Karine Zeitouni
GML is a promising model for integrating geodata within data warehouses. The resulting databases are generally large and require spatial operators... Sample PDF
GeoCache: A Cache for GML Geographical Data
Chapter 41
Juan M. Hernansaez, Juan A. Botia, Antonio F.G. Skarmeta
In this chapter we focus on the three approaches that seem to be the most successful ones in the Web usage mining area: clustering, association... Sample PDF
A Java Technology Based Distributed Software Architecture for Web Usage Mining
Chapter 42
Maria Luisa Damiani, Stefano Spaccapietra
This chapter is concerned with multidimensional data models for spatial data warehouses. Over the last few years different approaches have been... Sample PDF
Spatial Data Warehouse Modelling
Chapter 43
Rodolfo Villarroel, Eduardo Fernandez-Medina, Juan Trujillo, Mario Piattini
Organizations depend increasingly on information systems, which rely upon databases and data warehouses (DWs), which need increasingly more quality... Sample PDF
Designing Secure Data Warehouses
Chapter 44
Bhavani Thuraisingham
This article first describes the privacy concerns that arise due to data mining, especially for national security applications. Then we discuss... Sample PDF
Privacy-Preserving Data Mining: Development and Directions
Chapter 45
Xining Li, Lei Song
Mining information from distributed data sources over the Internet is a growing research area. The introduction of mobile agent paradigm opens a new... Sample PDF
A Service Discovery Model for Mobile Agent Based Distributed Data Mining
Chapter 46
Pedro Furtado
Data Warehouses (DWs) with large quantities of data present major performance and scalability challenges, and parallelism can be used for major... Sample PDF
Node Partitioned Data Warehouses: Experimental Evidence and Improvements
Chapter 47
Matteo Golfarelli, Stefano Rizzi
Though in most data warehousing applications no relevance is given to the time when events are recorded, some domains call for a different behavior.... Sample PDF
Managing Late Measurements in Data Warehouses
Chapter 48
Tho Manh Nguyen, Peter Brezany, A. Min Tjoa, Edgar Weippl
Continuous data streams are information sources in which data arrives in high volume in unpredictable rapid bursts. Processing data streams is a... Sample PDF
Toward a Grid-Based Zero-Latency Data Warehousing Implementation for Continuous Data Streams Processing
Chapter 49
Colleen Cunningham, Il-Yeol Song, Peter P. Chen
CRM is a strategy that integrates concepts of knowledge management, data mining, and data warehousing in order to support an organization’s... Sample PDF
Data Warehouse Design to Support Customer Relationship Management Analysis
Chapter 50
Gianluigi Greco, Antonella Guzzo, Luigi Pontieri
Mining process logs has been increasingly attracting the data mining community, due to the chances the development of process mining techniques can... Sample PDF
An Information-Theoretic Framework for Process Structure and Data Mining
Chapter 51
Longbing Cao, Chengqi Zhang
Extant data mining is based on data-driven methodologies. It either views data mining as an autonomous data-driven, trial-and-error process or only... Sample PDF
Domain-Driven Data Mining: A Practical Methodology
Chapter 52
Dan A. Simovici
This chapter presents data mining techniques that make use of metrics defined on the set of partitions of finite sets. Partitions are naturally... Sample PDF
Metric Methods in Data Mining
Chapter 53
Maribel Yasmina Santos, Luís Alfredo Amaral
Knowledge discovery in databases is a process that aims at the discovery of associations within data sets. The analysis of geo-referenced data... Sample PDF
Mining Geo-Referenced Databases: A Way to Improve Decision-Making
Chapter 54
Peter Brezany, Ivan Janciak, A. Min Tjoa
This chapter introduces an ontology-based framework for automated construction of complex interactive data mining workflows as a means of improving... Sample PDF
Ontology-Based Construction of Grid Data Mining Workflows
Chapter 55
T. Warren Liao
In this chapter, we present genetic algorithm (GA) based methods developed for clustering univariate time series with equal or unequal length as an... Sample PDF
Exploratory Time Series Data Mining by Genetic Clustering
Chapter 56
Jerzy W. Grzymala-Busse, Zdzislaw S. Hippe, Teresa Mroczek, Edward Roj, Boleslaw Skowronski
Results of our research on using two approaches, both based on rough sets, to mining three data sets describing bed caking during the hop extraction... Sample PDF
Two Rough Set Approaches to Mining Hop Extraction Data
Chapter 57
Alfredo Cuzzocrea, Domenico Sacca, Paolo Serafino
Efficiently supporting advanced OLAP visualization of multidimensional data cubes is a novel and challenging research topic, which results to be of... Sample PDF
Semantics-Aware Advanced OLAP Visualization of Multidimensional Data Cubes
Chapter 58
Andreas Maniatis, Panos Vassiliadis, Spiros Skiadopoulos, Yannis Vassiliou, George Mavrogonatos, Ilias Michalarias
Data visualization is one of the major issues of database research. OLAP a decision support technology, is clearly in the center of this effort.... Sample PDF
A Presentation Model & Non-Traditional Visualization for OLAP
Chapter 59
Adrian Mocan, Emilia Cimpian
n a semantic environment data is described by ontologies and ontology mapping has become a crucial aspect in solving the heterogeneity problems of... Sample PDF
An Ontology-Based Data Mediation Framework for Semantic Environments
Chapter 60
Haya El-Ghalayini, Mohammed Odeh, Richard McClatchey
This article studies the differences and similarities between domain ontologies and conceptual data models and the role that ontologies can play in... Sample PDF
Engineering Conceptual Data Models from Domain Ontologies: A Critical Evaluation
Chapter 61
Sachin Shetty, Min Song, Mansoor Alam
A Bayesian network model is a popular formalism for data mining due to its intuitive interpretation. This chapter presents a semantic genetic... Sample PDF
Data Mining of Bayesian Network Structure Using a Semantic Genetic Algorithm-Based Approach
Chapter 62
Peng-Yeng Yin, Shyong-Jian Shyu, Guan-Shieng Huang, Shuang-Te Liao
With the advent of new sequencing technology for biological data, the number of sequenced proteins stored in public databases has become an... Sample PDF
A Bayesian Framework for Improving Clustering Accuracy of Protein Sequences Based on Association Rules
Chapter 63
Mina Jeong, Doheon Lee
Classification is an important problem in data mining. Given a database of records, each tagged with a class label, a classifier generates a concise... Sample PDF
Improving Classification Accuracy of Decision Trees for Different Abstraction Levels of Data
Chapter 64
Ioannis Liabotis, Babis Theodoulidis, Mohamad Saraaee
Sequences constitute a large portion of data stored in databases. Data mining applications require the ability to process similarity queries over a... Sample PDF
Improving Similarity Search in Time Series Using Wavelets
Chapter 65
Can Yang, Jun Meng, Shanan Zhu
Input selection is an important step in nonlinear regression modeling. By input selection, an interpretable model can be built with less... Sample PDF
Cluster-Based Input Selection for Transparant Fuzzy Modeling
Chapter 66
D. Frank Hsu, Yun-Sheng Chung, Kristal Bruce S.
Combination methods have been investigated as a possible means to improve performance in multi-variable (multi-criterion or multi-objective)... Sample PDF
Combinatorial Fusion Analysis: Methods and Practices of Combining Multiple Scoring Systems
Chapter 67
Z. M. Ma
Information systems have become the nerve center of current computer-based engineering applications, which hereby put the requirements on... Sample PDF
Databases Modeling of Engineering Information
Chapter 68
Lixin Fu
Existing decision tree algorithms need to recursively partition dataset into subsets according to some splitting criteria. For large data sets, this... Sample PDF
Novel Efficient Classifiers Based on Data Cube
Chapter 69
Zhigang Liu, Wenzhong Shi, Deren Li, Qianqing Qin
This paper addresses a new classification technique: Partially Supervised Classification (PSC), which is used to identify a specific land-cover... Sample PDF
Partially Supervised Classification: Based on Weighted Unlabeled Samples Support Vector Machine
Chapter 70
Jaehoon Kim, Seong Park
Much of the research regarding streaming data has focused only on real time querying and analysis of recent data stream allowable in memory.... Sample PDF
Periodic Streaming Data Reduction Using Flexible Adjustment of Time Section Size
Chapter 71
Cyrus Shahabi, Mehrdad Jahangiri, Dimitris Sacharidis
Data analysis systems require range-aggregate query answering of large multidimensional datasets. We provide the necessary framework to build a... Sample PDF
Hybrid Query and Data Qodering for Fast and Progressive Range-Aggregate Query Answering
Chapter 72
Xiuju Fu, Lipo Wang, GihGuang Hung, Liping Goh
Classification decisions from linguistic rules are more desirable compared to complex mathematical formulas from support vector machine (SVM)... Sample PDF
Linguistic Rule Extraction from Support Vector Machine Classifiers
Chapter 73
Moonjung Cho, Jian Pei, Haixun Wang, Wei Wang
Frequent pattern mining is an important data-mining problem with broad applications. Although there are many in-depth studies on efficient frequent... Sample PDF
Preference-Based Frequent Pattern Mining
Chapter 74
Algorithms for Data Mining  (pages 1301-1319)
Tadao Takaoka, Nigel K.L. Pope, Kevin E. Voges
In this chapter, we present an overview of some common data mining algorithms. Two techniques are considered in detail. The first is association... Sample PDF
Algorithms for Data Mining
Chapter 75
Anthony Bagnall, Gavin Cawley, Ian Whittley, Larry Bull, Matthew Studley, Mike Pettipher, Firat Tekiner
This article describes the entry of the Super Computer Data Mining (SCDM) Project to the 10th Pacific-Asia Conference on Knowledge Discovery and... Sample PDF
Super Computer Heterogeneous Classifier Meta-Ensembles
Chapter 76
Navin Kumar, Aryya Gangopadhyay, George Karabatis, Sanjay Bapna, Zhiyuan Chen
Navigating through multidimensional data cubes is a nontrivial task. Although On-Line Analytical Processing (OLAP) provides the capability to view... Sample PDF
Navigation Rules for Exploring Large Multidimensional Data Cubes
Chapter 77
Christie I. Ezeife, Timothy E. Ohanekwu
Identifying integrated records that represent the same real-world object in numerous ways is just one form of data disparity (dirt) to be resolved... Sample PDF
The Use of Smart Tokens in Cleaning Integrated Warehouse Data
Chapter 78
Gian Piero Zarri
In this chapter, we evoke first the ubiquity and the importance of the so-called ‘narrative’ information, showing that the usual ontological tools... Sample PDF
An Implemented Representation and Reasoning Systems for Creating and Exploiting Large Knowledge Bases of Narrative Information
Chapter 79
Margaret H. Dunham, Nathaniel Ayewah, Zhigang Li, Kathryn Bean, Jie Huang
The spatio-temporal prediction problem requires that one or more future values be predicted for time series input data obtained from sensors at... Sample PDF
Spatio-Temporal Prediction Using Data Mining Tools
Chapter 80
Taeho Hong, Woojong Suh
Data mining has drawn much attention in generating the useful information from Web data. Data mining techniques have typically considered... Sample PDF
Data Mining Using Qualitative Information on the Web
Chapter 81
Masoud Mohammadian, Ric Jentzsch
The World Wide Web has added an abundance of data and information to the complexity of information for disseminators and users alike. With this... Sample PDF
Computational Intelligence Techniques Driven Intelligent Agents for Web Data Mining and Information Retrieval
Chapter 82
Kuldeep Kumar
Data mining has emerged as one of the hottest topics in recent years. It is an extraordinarily broad area and is growing in several directions. With... Sample PDF
Internet Data Mining Using Statistical Techniques
Chapter 83
Mining E-Mail Data  (pages 1454-1460)
Steffen Bickel, Tobias Scheffer
E-mail has become one of the most important communication media for business and private purposes. Large amounts of past e-mail records reside on... Sample PDF
Mining E-Mail Data
Chapter 84
Neil C. Rowe
We survey research on using captions in data mining from the Web. Captions are text that describes some other information (typically, multimedia).... Sample PDF
Exploiting Captions for Web Data Mining
Chapter 85
A. Andreevskaia, R. Abi-Aad, T. Radhakrishnan
This chapter presents a tool for knowledge acquisition for user profiling in electronic commerce. The knowledge acquisition in e-commerce is a... Sample PDF
Agent-Mediated Knowledge Acquisition for User Profiling
Chapter 86
John Goh, David Taniar
Mobile user data mining is the process of extracting interesting knowledge from data collected from mobile users through various data mining... Sample PDF
Mobile User Data Mining and Its Applications
Chapter 87
Dan Steinberg, Mikhaylo Golovnya, Nicholas Scott Cardell
Mobile phone customers face many choices regarding handset hardware, add-on services, and features to subscribe to from their service providers.... Sample PDF
Mobile Phone Customer Type Discrimination via Stochastic Gradient Boosting
Chapter 88
Shi-Ming Huang, Binshan Lin, Qun-Shi Deng
This research proposes an intelligent cache mechanism for a data warehouse system in a mobile environment. Because mobile devices can often be... Sample PDF
Intelligent Cache Management for Mobile Data Warehouse Systems
Chapter 89
H. Azzag, F. Picarougne, C. Guinot, G. Venturini
We present in this chapter a new 3D interactive method for visualizing multimedia data with virtual reality named VRMiner. We consider that an... Sample PDF
VRMiner: A Tool for Multimedia Database Mining With Virtual Reality
Chapter 90
Mehmed Kantardzic, Pedram Sadeghian, Walaa M. Sheta
Advances in computing techniques, as well as the reduction in the cost of technology have made possible the viability and spread of large virtual... Sample PDF
Spatial Navigation Assistance System for Large Virtual Environments: The Data Mining Approach
Chapter 91
Kurt Stockinger, Kesheng Wu
In this chapter we discuss various bitmap index technologies for efficient query processing in data warehousing applications. We review the existing... Sample PDF
Bitmap Indices for Data Warehouses
Chapter 92
Karen C. Davis, Ashima Gupta
Bitmap Indexes (BIs) allow fast access to individual attribute values that are needed to answer a query by storing a bit for each distinct value and... Sample PDF
Indexing in Data Warhousing: Bitmaps and Beyond
Chapter 93
Herna L. Viktor, Eric Paquet
The current explosion of data and information, mainly caused by data warehousing technologies as well as the extensive use of the Internet and its... Sample PDF
Visualization Techniques for Data Mining
Chapter 94
Video Data Mining  (pages 1631-1637)
Jung Hwan Oh, Jeong Kyu Lee, Sae Hwang
Data mining, which is defined as the process of extracting previously unknown knowledge and detecting interesting patterns from a massive set of... Sample PDF
Video Data Mining
Chapter 95
Shouhong Wang, Hai Wang
In the data mining field, people have no doubt that high level information (or knowledge) can be extracted from the database through the use of... Sample PDF
Interactive Visual Data Mining
Chapter 96
Jilin Han, Le Gruenwald, Tyrrell Conway
The study of gene expression levels under defined experimental conditions is an important approach to understand how a living cell works.... Sample PDF
Data Mining in Gene Expression Analysis: A Survey
Chapter 97
Takashi Kido
This chapter introduces computational methods for detecting complex disease loci with haplotype analysis. It argues that the haplotype analysis... Sample PDF
A Haplotype Analysis System for Genes Discovery of Common Diseases
Chapter 98
Chandra S. Amaravadi
In the past decade, a new and exciting technology has unfolded on the shores of the information systems area. Based on a combination of statistical... Sample PDF
Strategic Utilization of Data Mining
Chapter 99
Biological Data Mining  (pages 1696-1705)
George Tzanis, Christos Berberidis, Ioannis Vlahavas
At the end of the 1980s, a new discipline named data mining emerged. The introduction of new technologies such as computers, satellites, new mass... Sample PDF
Biological Data Mining
Chapter 100
Feng Chu, Lipo Wang
Accurate diagnosis of cancers is of great importance for doctors to choose a proper treatment. Furthermore, it also plays a key role in the... Sample PDF
Biomedical Data Mining Using RBF Neural Networks
Chapter 101
Boris Galitsky
Bioinformatics is the science of storing, extracting, organizing, analyzing, interpreting, and utilizing information from biological sequences and... Sample PDF
Bioinformatics Data Management and Data Mining
Chapter 102
Pedro Gabriel Ferreira, Paulo Jorge Azevedo
Protein sequence motifs describe, through means of enhanced regular expression syntax, regions of amino-acids that have been conserved across... Sample PDF
Deterministic Motif Mining in Protein Databases
Chapter 103
Christopher Besemann, Anne Denton, Ajay Yekkirala, Ron Hutchison, Anderson Marc
In this chapter, we discuss the use of differential association rules to study the annotations of proteins in one or more interaction networks.... Sample PDF
Differential Association Rules: Understanding Annotations in Protein Interaction Networks
Chapter 104
Christian Baumgartner, Armin Graber
This chapter provides an overview of the knowledge discovery process in metabolomics, a young discipline in the life sciences arena. It introduces... Sample PDF
Data Mining and Knowledge Discovery in Metabolomics Armin
Chapter 105
Kwangmin Choi, Sun Kim
Understanding the genetic content of a genome is a very important but challenging task. One of the most effective methods to annotate a genome is to... Sample PDF
Comparative Genome Annotation Systems
Chapter 106
Theodore L. Perry, Travis Tucker, Laurel R. Hudson, William Gandy, Amy L. Neftzger, Guy B. Hamar
Healthcare has become a data-intensive business. Over the last 30 years, we have seen significant advancements in the areas of health information... Sample PDF
The Application of Data Mining Techniques in Health Plan Population Management: A Disease Management Approach
Chapter 107
Colleen Cunningham, Xiaohua Hu
Given the exponential growth rate of medical data and the accompanying biomedical literature, more than 10,000 documents per week (Leroy et al.... Sample PDF
Data Mining Medical Digital Libraries
Chapter 108
Indranil Bose
Diabetes is a disease worrying hundreds of millions of people around the world. In the USA, the population of diabetic patients is about 15.7... Sample PDF
Data Mining in Diabetes Diagnosis and Detection
Chapter 109
L. Venkat Narayanan
In an increasingly competitive market, banks are constantly searching for sustainable competitive advantage to help them maintain their edge against... Sample PDF
Data Warehousing and Analytics in Banking: Concepts
Chapter 110
L. Venkat Narayanan
Data Warehousing and Analytics represent one of the foremost technologies that can be used by banks to obtain sustainable competitive advantage.... Sample PDF
Data Warehousing and Analytics in Banking: Implementation
Chapter 111
Anna Olecka
This chapter will focus on challenges in modeling credit risk for new accounts acquisition process in the credit card industry. First section... Sample PDF
Beyond Classification: Challenges of Data Mining for Credit Scoring
Chapter 112
Desheng Wu, David L. Olson
The technique for order preference by similarity to ideal solution (TOPSIS) is a technique that can consider any number of measures, seeking to... Sample PDF
A TOPSIS Data Mining Demonstration and Application to Credit Scoring
Chapter 113
Jeff Hoffman
NASA Missions are as varied as the mandate of the agency. From using satellite imaging to study climate change to scanning deep space with the... Sample PDF
The Utilization of Business Intelligence and Data Mining in the Insurance Marketplace
Chapter 114
Shastri L. Nimmagadda, Heinz Dreher
Several issues of database organization of petroleum industries have been highlighted. Complex geo-spatial heterogeneous data structures complicate... Sample PDF
Ontology-Based Data Warehousing and Mining Approaches in Petroleum Industries
Chapter 115
Shanfeng Chu, Xiaotie Deng, Qizhi Fang, Weimin Zhang
Web search engines are one of the most popular services to help users find useful information on the Web. Although many studies have been carried... Sample PDF
A Study on Web Searching: Overlap and Distance of the Search Engine Results
Chapter 116
Richi Nayak
The business needs, the availability of huge volumes of data and the continuous evolution in Web services functions derive the need of application... Sample PDF
Data Mining in Web Services Discovery and Monitoring
Chapter 117
Mohamed Hammami, Youssef Chahir, Liming Chen
Along with the ever growing Web is the proliferation of objectionable content, such as sex, violence, racism, and so forth. We need efficient tools... Sample PDF
A Data Mining Driven Approach for Web Classification and Filtering Based on Multimodal Content Analysis
Chapter 118
Marko Brunzel, Myra Spiliopoulou
The automated discovery of relationships among terms contributes to the automation of the ontology engineering process and allows for sophisticated... Sample PDF
Acquiring Semantic Sibling Associations from Web Documents
Chapter 119
Jenq-Foung Yao, Yongqiao Xiao
Web usage mining is to discover useful patterns in the web usage data, and the patterns provide useful information about the user’s browsing... Sample PDF
Traversal Pattern Mining in Web Usage Data
Chapter 120
Richi Nayak
Web services have recently received much attention in businesses. However, a number of challenges such as lack of experience in estimating the... Sample PDF
Facilitating and Improving the Use of Web Services with Data Mining
Chapter 121
Mohammad M. Masud, Latifur Khan, Bhavani Thuraisingham
This work applies data mining techniques to detect e-mail worms. E-mail messages contain a number of different features such as the total number of... Sample PDF
E-Mail Worm Detection Using Data Mining
Chapter 122
Yan Zho, Yaohua Chen, Yiyu Yao
While many data mining models concentrate on automation and efficiency, interactive data mining models focus on adaptive and effective... Sample PDF
User-Centered Interactive Data Mining
Chapter 123
Antonino Staiano, Lara De Vinco, Giuseppe Longo, Roberto Tagliaferri
Probabilistic Principal Surfaces (PPS) is a non linear latent variable model with very powerful visualization and classification capabilities which... Sample PDF
Advanced Data Mining and Visualization Techniques with Probabilistic Principal Surfaces: Applications to Astronomy and Genetics
Chapter 124
Qingyu Zhang, Richard S. Segall
This chapter illustrates the use of data mining as a computational intelligence methodology for forecasting data management needs. Specifically... Sample PDF
Using Data Mining for Forecasting Data Management Needs
Chapter 125
Kesaraporn Techapichetvanich, Amitava Datta
Both visualization and data mining have become important tools in discovering hidden relationships in large data sets, and in extracting useful... Sample PDF
Visual Data Mining for Discovering Association Rules
Chapter 126
Rafal Angryk, Roy Ladner, Frederick E. Petry
In this chapter, we consider the application of generalization-based data mining to fuzzy similarity-based object-oriented databases (OODBs).... Sample PDF
Generalization Data Mining in Fuzzy Object-Oriented Databases
Chapter 127
Nikos Pelekis, Babis Theodoulikis, Ioannis Kopanakis, Yannis Theodoridis
We study the problem of classification as this is presented in the context of data mining. Among the various approaches that are investigated, we... Sample PDF
Fuzzy Miner: Extracting Fuzzy Rules from Numerical Patterns
Chapter 128
Svetlana Mansmann, Marc H. Scholl
Comprehensive data analysis has become indispensable in a variety of domains. OLAP (On-Line Analytical Processing) systems tend to perform poorly or... Sample PDF
Empowering the OLAP Technology to Support Complex Dimension Hierarchies
Chapter 129
John D. Wells, Traci J. Hess
Many businesses have made or are making significant investments in data warehouses that reportedly support a myriad of decision support systems... Sample PDF
Understanding Decision-Making in Data Warehousing and Related Decision Support Systems: An Explanatory Study of a Customer Relationship Management Application
Chapter 130
Mesbah U. Ahmed, Vikas Agrawal, Udayan Nandkeolyar, P. S. Sundararaghavan
In any online decision support system, the backbone is a data warehouse. In order to facilitate rapid response to complex business decision support... Sample PDF
Statistical Sampling to Instantiate Materialized View Selection Problems in Data Warehouses
Chapter 131
Alex Burns, Shital Shah, Andrew Kusiak
This paper presents a hybrid approach that integrates a genetic algorithm (GA) and data mining to produce control signatures. The control signatures... Sample PDF
Development of Control Signatures with a Hybrid Data Mining and Genetic Algorithm
Chapter 132
George Potamias, Alexandros Kanterakis
With the completion of various whole genomes, one of the fundamental bioinformatics tasks is the identification of functional regulatory regions... Sample PDF
Feature Selection for the Promoter Recognition and Prediction Problem
Chapter 133
Hadrian Peter, Charles Greenidge
Modern database systems have incorporated the use of DSS (Decision Support Systems) to augment their decision-making business function and to allow... Sample PDF
Data Warehousing Search Engine
Chapter 134
Data Mining in Practice  (pages 2273-2280)
Sherry Y. Chen, Xiaohui Liu
There is an explosion in the amount of data that organizations generate, collect, and store. Organizations are gradually relying more on new... Sample PDF
Data Mining in Practice
Chapter 135
Diego Liberati
Four main general purpose approaches inferring knowledge from data are presented as a useful pool of at least partially complementary techniques... Sample PDF
Model Identification Through Data Mining
Chapter 136
Hamid R. Nemati, Christopher D. Barko
An increasing number of organizations are struggling to overcome “information paralysis” — there is so much data available that it is difficult to... Sample PDF
Organizational Data Mining (ODM): An Introduction
Chapter 137
Isabel Ramos, João Álvaro Carvalho
Scientific or organizational knowledge creation has been addressed from different perspectives along the history of science and, in particular, of... Sample PDF
Constructionist Perspective of Organizational Data Mining
Chapter 138
Chandra S. Amaravadi, Farhad Daneshgar
Data mining has quickly emerged as a tool that can allow organizations to exploit their information assets. In this chapter, we suggest how this... Sample PDF
The Role of Data Mining in Organizational Cognition
Chapter 139
Ana Isabel Canhoto
The use of automated systems to collect, process and analyse vast amounts of data is now integral to the operations of many corporations and... Sample PDF
Ontology-Based Interpretation and Validation of Mined Knowledge: Normative and Cognitive Factors in Data Mining
Chapter 140
Susanta Mitra, Aditya Bagchi, A. K. Bandyopadhyay
A social network defines the structure of a social community like an organization or institution, covering its members and their... Sample PDF
Design of a Data Model for Social Network Applications
Chapter 141
Humanitites Data Warehousing  (pages 2364-2370)
Janet Delve
Data Warehousing is now a well-established part of the business and scientific worlds. However, up until recently, data warehouses were restricted... Sample PDF
Humanitites Data Warehousing
Chapter 142
Marvin D. Troutt, Lori K. Long
In this paper, we briefly review and update our earlier work (Long & Troutt, 2003) on the topic of data mining in the human resources area. To gain... Sample PDF
Data Mining in Human Resources
Chapter 143
Igor Nai Fovino
Intense work in the area of data mining technology and in its applications to several domains has resulted in the development of a large variety of... Sample PDF
Privacy Preserving Data Mining, Concepts, Techniques, and Evaluation Methodologies
Chapter 144
Lixin Fu, Hamid Nemati, Fereidoon Sadri
Privacy-preserving data mining (PPDM) refers to data mining techniques developed to protect sensitive data while allowing useful information to be... Sample PDF
Privacy-Preserving Data Mining and the Need for Confluence of Research and Practice
Chapter 145
Les Pang
Data mining has been a successful approach for improving the level of business intelligence and knowledge management throughout an organization.... Sample PDF
Data Mining In the Federal Government
Chapter 146
Franklin Maxwell Haper
Data warehousing is a technology architecture designed to organize disparate data sources into a single repository of information. As such, it... Sample PDF
Data Warehousing and the Organization of Governmental Databases
Chapter 147
Àkos Felsovalyi, Jennifer Courant
Banking has changed rapidly over the last decades due to the ability to capture massive data sets easily and the availability of new tools for... Sample PDF
Data Mining and the Banking Sector: Managing Risk in Lending and Credit Card Activities
Chapter 148
Indranil Bose, Cheng Pui Kan, Chi King Tsz, Lau Wai Ki, Wong Cho Hung
Credit scoring is one of the most popular uses of data mining in the financial industry. Credit scoring can be defined as a technique that helps... Sample PDF
Data Mining for Credit Scoring
Chapter 149
André de Carvalho, Antonio P. Braga, Teresa Ludermir
The widespread use of databases and the fast increase in the volume of data they store are creating problems and new opportunities for credit... Sample PDF
Credit Card Users' Data Mining
Chapter 150
Mahesh S. Raisinghani, Manoj K. Singh
Supply chain comprises the flow of products, information, and money. In traditional supply chain management, business processes are disconnected... Sample PDF
Data Mining for Supply Chain Management in Complex Networks
Chapter 151
David Encke
Researchers have known for some time that nonlinearity exists in the financial markets and that neural networks can be used to forecast market... Sample PDF
Neural Network-Based Stock Market Return Forecasting Using Data Mining for Variable Reduction
Chapter 152
Murat Caner Testik, George C. Runger, Bradford Kirkman-Liff, Edward A. Smith
Health care organizations are struggling to find new ways to cut healthcare utilization and costs while improving quality and outcomes. Predictive... Sample PDF
Data Mining and Knowledge Discovery in Healthcare Organizations: A Decision-Tree Approach
Chapter 153
N. Sriraam, V. Natasha, H. Kaur
Data mining has been emerging recently as a viable computational tool for autonomous decision making especially in the field of medical... Sample PDF
Data Mining Techniques and Medical Decision Making for Urological Dysfunction
Chapter 154
Susan E. George
This chapter presents a survey of medical data mining focusing upon the use of heuristic techniques. We observe that medical mining has some unique... Sample PDF
Heuristics in Medical Data Mining
Chapter 155
Sikha Bagui
This paper presents a knowledge discovery effort to retrieve meaningful information about crime from a U.S. state database. The raw data were... Sample PDF
An Approach to Mining Crime Patterns
Chapter 156
Bamshad Mobasher
Web usage mining refers to the automatic discovery and analysis of patterns in clickstream and associated data collected or generated as a result of... Sample PDF
Web Usage Mining Data Preparation
Chapter 157
Ankur Jain, Lalit Wangikar, Martin Ahrens, Ranjan Rao, Suddha Sattwa Kundu, Sutirtha Ghosh
In this article we discuss how we have predicted the third generation (3G) customers using lo-gistic regression analysis and statistical tools like... Sample PDF
Classification Of 3G Mobile Phone Customers
Chapter 158
Jeff Zeanah
This chapter discusses impediments to exploratory data mining success. These impediments were identified based on anecdotal observations from... Sample PDF
Impediments to Exploratory Data Mining Success
Chapter 159
Jeffrey Hsu
Most businesses generate, are surrounded by, and are even overwhelmed by data — much of it never used to its full potential for gaining insights... Sample PDF
Data Mining and Business Intelligence: Tools, Technologies, and Applications
Chapter 160
Auroop R. Ganguly, Amar Gupta, Shiraj Khan
Information by itself is no longer perceived as an asset. Billions of business transactions are recorded in enterprise-scale data warehouses every... Sample PDF
Data Mining and Decision Support for Business and Science
Chapter 161
Aristides Triantafillakis, Panagiotis Kanellis, Drakoulis Martakos
The purpose of this paper is to raise awareness and identify a number of challenges regarding the issue of data warehouse interoperation in... Sample PDF
Data Warehousing Interoperability for the Extended Enterprise
Chapter 162
Richard Mathieu, Reuven R. Levary
Every finished product has gone through a series of transformations. The process begins when manufacturers purchase the raw materials that will be... Sample PDF
Data Warehousing and Mining in Supply Chains
Chapter 163
Jon R. Wright, Gregg T. Vesonder, Tamraparni Dasu
In an enterprise setting, a major challenge for any data mining operation is managing data streams or feeds, both data and metadata, to ensure a... Sample PDF
Management of Data Streams for Large-Scale Data Mining
Chapter 164
Jin Sung Kim
One of the attractive topics in the field of Internet business is blending Artificial Intelligence (AI) techniques with the business process. In... Sample PDF
Customized Recommendation Mechanism Based on Web Data Mining and Case-Based Reasoning
Chapter 165
Scott Nicholson, Jeffrey Stanton
Library and information services in corporations, schools, universities and communities capture information about their users, circulation history... Sample PDF
Gaining Strategic Advantage Through Bibliomining: Data Mining for Management Decisions in Corporate, Special, Digital and, Traditional Libraries
Chapter 166
Edilberto Casado
Business intelligence (BI) is a key topic in business today, since it is focused on strategic decision making and on the search of value from... Sample PDF
Expanding Data Mining Power with System Dynamics
Chapter 167
Richi Nayak
Research and practices in mobile (m-) business have seen an exponential growth in the last decade (CNN, 2002; Leisen, 2000; McDonough, 2002; Purba... Sample PDF
Data Mining and Mobile Business Data
Chapter 168
T. T. Wong
Nowadays, many enterprises manufacture and distribute their products or services globally, and quite a number of smart organizations are formed on... Sample PDF
Neural Data Mining System for Trust-Based Evaluation in Smart Organizations
Chapter 169
Ye-Sho Chen, Robert Justis, P. Pete Chong
Franchising has been used by businesses as a growth strategy. Based on the authors’ cumulative research and experience in the industry, this paper... Sample PDF
Data Mining in Franchise Organizations
Chapter 170
Henry Dillon, Beverley Hope
Knowledge discovery in databases (KDD) is a field of research that studies the development and use of various data analysis tools and techniques.... Sample PDF
Translating Advances in Data Mining in Business Operations: The Art of Data Mining in Retailing
Chapter 171
Hugh J. Watson, Barbara H. Wixom, Dale L. Goodhue
Data warehouses are helping resolve a major problem that has plagued decision support applications over the years — a lack of good data. Top... Sample PDF
Data Warehousing: The 3M Experience
Chapter 172
Indranil Bose, Lam Albert Kar Chun, Leung Vivien Wai Yue, Li Hoi Wan Ines, Wong Oi Ling Helen
The retailing giant Wal-Mart owes its success to the efficient use of information technology in its operations. One of the noteworthy advances made... Sample PDF
Business Data Warehouse: The Case of Wal-Mart
Chapter 173
Kate A. Smith, Mark S. Dale
This chapter employs Michael Porter’s Five Forces model to understand the potential strategic value of data mining within the Australian banking... Sample PDF
A Porter Framework for Understanding the Strategic Potential of Data Mining for the Australian Banking Industry
Chapter 174
Chi Kin Chan
The traditional approach to forecasting involves choosing the forecasting method judged most appropriate of the available methods and applying it to... Sample PDF
Data Mining for Combining Forecasts in Inventory Management
Chapter 175
Jianxin ("Roger") Jiao, Yiyang Zhang, Martin Helander
This chapter applies data-mining techniques to help manufacturing companies analyze their customers’ requirements. Customer requirement analysis has... Sample PDF
Analytical Customer Requirement Analysis Based on Data Mining
Chapter 176
Yang Yu, De-Chuan Zhan, Xu-Ying Liu, Ming Li, Zhi-Hua Zhou
Our LAMDAer team has won the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2006 Data Mining Competition (open... Sample PDF
Predicting Future Customers via Ensembling Gradually Expanded Trees
Chapter 177
Marketing Data Mining  (pages 2824-2832)
Victor S.Y. Lo
Data mining has been widely applied over the past two decades. In particular, marketing is an important application area. Many companies collect... Sample PDF
Marketing Data Mining
Chapter 178
Ethics Of Data Mining  (pages 2834-2840)
Jack Cook
Decision makers thirst for answers to questions. As more data is gathered, more questions are posed: Which customers are most likely to respond... Sample PDF
Ethics Of Data Mining
Chapter 179
Joseph A. Cazier, Ryan C. LaBrie
As we have increasing privacy and risk concerns in the world today with identity theft, questionable marketing, data mining, and profiling, it is... Sample PDF
Ethical Dilemmas in Data Mining and Warehousing
Chapter 180
Yücel Saygin
Data regarding people and their activities have been collected over the years, which has become more pervasive with widespread usage of the... Sample PDF
Privacy and Confidentiality Issues in Data Mining
Chapter 181
Hamid R. Nemati, Charmion Brathwaite, Kara Harrington
Technological advances and decreased costs of implementing and using technology have allowed for vast amounts of data to be collected, used and... Sample PDF
Privacy Implications of Organizational Data Mining
Chapter 182
James Lawler, John C. Molluzzo
Many companies, such as Wal-Mart, store much of their business and customer data in large databases called data warehouses. Their customers are not... Sample PDF
Privacy in Data Mining Textbooks
Chapter 183
Aleksandar Lazarevic
Today computers control power, oil and gas delivery, communication systems, transportation networks, banking and financial services, and various... Sample PDF
Data Mining for Intrusion Detection
Chapter 184
Parviz Partow-Navid, Ludwig Slusky
Web mining is the application of data mining techniques to discover the usage patterns of Web data, in order to better serve the needs of Web site... Sample PDF
E-Commerce and Data Mining: Integration Issues and Challenges
Chapter 185
Hokey Min, Ahmed Emam
A successful path to purchasing negotiation often hinges on the buyer’s ability to gain relative bargaining strength. The buyer’s bargaining... Sample PDF
A Data Mining Approach to Formulating a Successful Purchasing Negotiation Strategy
Chapter 186
Thomas Chesney, Kay Penny, Peter Oakley, Simon Davies, David Chesney, Nicola Maffulli, John Templeton
Trauma audit is intended to develop effective care for injured patients through process and outcome analysis, and dissemination of results. The... Sample PDF
Data Mining Medical Information: Should Artificial Neural Networks Be Used to Analyse Trauma Audit Data?
Chapter 187
Gwo-Jen Hwang
In recent years, researchers have attempted to develop more effective distance education systems. Nevertheless, students in network-based learning... Sample PDF
A Data Mining Approach to Diagnosing Student Learning Problems in Sciences Courses
Chapter 188
Malcolm J. Beynon
The efficacy of data mining lies in its ability to identify relationships amongst data. This chapter investigates that constraining this efficacy is... Sample PDF
Effective Intelligent Data Mining Using Dempster-Shafer Theory
Chapter 189
Rahul Singh, Richard T. Redmond, Victoria Yoon
Intelligent decision support requires flexible, knowledge-driven analysis of data to solve complex decision problems faced by contemporary decision... Sample PDF
An Intelligent Support System Integrating Data Mining and Online Analytical Processing
Chapter 190
Jianting Zhang, Wieguo Liu, Le Gruenwald
Decision trees (DT) has been widely used for training and classification of remotely sensed image data due to its capability to generate human... Sample PDF
A Successive Decision Tree Approach to Mining Remotely Sensed Image Data
Chapter 191
George Tzanis, Christos Berberidis
Association rule mining is a popular task that involves the discovery of co-occurences of items in transaction databases. Several extensions of the... Sample PDF
Mining for Mutually Exclusive Items in Transaction Databases
Chapter 192
Benjamin Griffiths, Malcolm J. Beynon
Predictive accuracy, as an estimation of a classifier’s future performance, has been studied for at least seventy years. With the advent of the... Sample PDF
Re-Sampling Based Data Mining Using Rough Set Theory
Chapter 193
Hai Wang, Shouhong Wang
Survey is one of the common data acquisition methods for data mining (Brin, Rastogi & Shim, 2003). In data mining one can rarely find a survey data... Sample PDF
Data Mining with Incomplete Data
Chapter 194
Yanbing Liu, Shixin Sun, Menghao Wang, Hong Tang
QOSPF(Quality of Service Open Shortest Path First)based on QoS routing has been recognized as a missing piece in the evolution of QoS-based services... Sample PDF
Routing Attribute Data Mining Based on Rough Set Theory
Chapter 195
Data Warehouse Refreshment  (pages 3049-3066)
Alkis Simitsis, Panos Vassiliadis, Spiros Skiadopoulos, Timos Sellis
In the early stages of a data warehouse project, the designers/administrators have to come up with a decision concerning the design and deployment... Sample PDF
Data Warehouse Refreshment
Chapter 196
John Talburt, Richard Wang, Kimberly Hess, Emily Kuo
This chapter introduces abstract algebra as a means of understanding and creating data quality metrics for entity resolution, the process in which... Sample PDF
An Algebraic Approach to Data Quality Metrics for Entity Resolution over Large Datasets
Chapter 197
Biren Shah, Karthik Ramachandran, Vijay Raghavan
Materialized view selection is one of the crucial decisions in designing a data warehouse for optimal efficiency. Static selection of views may... Sample PDF
A Hybrid Approach for Data Warehouse View Selection
Chapter 198
Shi-Ming Huang, David C. Yen, Hsiang-Yuan Hsueh
The materialized view approach is widely adopted in implementations of data warehouse systems in or-der for efficiency purposes. In terms of the... Sample PDF
A Space-Efficient Protocol for Consistency of External View Maintenance on Data Warehouse Systems: A Proxy Approach
Chapter 199
Rodrigo Salvador Monteiro, Geraldo Zimbrao, Holger Schwarz, Bernhard Mitschang, Jano Moreira de Souza
This chapter presents the core of the DWFIST approach, which is concerned with supporting the analysis and exploration of frequent itemsets and... Sample PDF
DWFIST: The Data Warehouse of Frequent Itemsets Tactics Approach
Chapter 200
Tho Hoan Pham, Tu Bao Ho
There are in general three approaches to rule induction: exhaustive search, divide-and conquer, and separate-and-conquer (or its extension as... Sample PDF
A Hyper-Heuristic for Descriptive Rule Induction
Chapter 201
Ying Chen, Frank Dehne, Todd Eavis, A. Rau-Chaplin
This paper presents an improved parallel method for generating ROLAP data cubes on a shared-nothing multiprocessor based on a novel optimized data... Sample PDF
Improved Data Partitioning for Building Large ROLAP Data Cubes in Parallel
Chapter 202
Simon K. Milton, Ed Kazmierczak
Data modelling languages are used in today’s information systems engineering environments. Many have a degree of hype surrounding their quality and... Sample PDF
An Ontology of Data Modelling Languages: A Study Using a Common-Sense Realistic Ontology
Chapter 203
Alexandros Nanopoulos, Apostolos N. Papadopoulos, Yannis Manolopoulos, Tatjana Welzer-Druzovec
The existence of noise in the data significantly impacts the accuracy of classification. In this article, we are concerned with the development of... Sample PDF
Robust Classification Based on Correlations Between Attributes
Chapter 204
Yun Sing Koh, Nathan Rountree, Richard O’Keefe
Discovering association rules efficiently is an important data mining problem. We define sporadic rules as those with low support but high... Sample PDF
Finding Non-Coincidental Sporadic Rules Using Apriori-Inverse
Chapter 205
Carem C. Fabris, Alex A. Freitas
This paper focuses on the discovery of surprising unexpected patterns based on a data mining method that consists of detecting instances of... Sample PDF
Discovering Surprising Instances of Simpson's Paradox in Hierarchical Multidimensional Data
Chapter 206
Yongqiao Xiao, Jenq-Foung Yao, Guizhen Yang
Recent years have witnessed a surge of research interest in knowledge discovery from data domains with complex structures, such as trees and graphs.... Sample PDF
Discovering Frequent Embedded Subtree Patterns from Large Databases of Unordered Labeled Trees
Chapter 207
Sagar Savla, Sharma Chakravarthy
Sensor-based applications, such as smart homes, require prediction of event occurrences for automating the environment using time-series data... Sample PDF
A Single Pass Algorithm for Discovering Significant Intervals in Time-Series Data
Chapter 208
Pradeep Kumar, Raju S. Bapi, P. Radha Krishna
With the growth in the number of Web users and necessity for making information available on the Web, the problem of Web personalization has become... Sample PDF
SeqPAM: A Sequence Clustering Algorithm for Web Personalization
Chapter 209
Shawkat Ali, Kate A. Smith
The most critical component of kernel based learning algorithms is the choice of an appropriate kernel and its optimal parameters. In this paper we... Sample PDF
Kernel Width Selection for SVM Classification: A Meta-Learning Approach
Chapter 210
K. M. Azharul Hasan, Tatsuo Tsuji, Ken Higuchi
In this article, an efficient parallel implementation scheme of relational tables is proposed and evaluated. The scheme implements a relational... Sample PDF
A Parallel Implementation Scheme of Relational Tables Based on Multidimensional Extendible Array
Chapter 211
Rokia Missaoui, Ganaël Jatteau, Ameur Boujenoui, Sami Naouali
In this paper, we present alternatives for coupling data warehousing and data mining techniques so that they can benefit from each other’s advances... Sample PDF
Toward Integrating Data Warehousing with Data Mining Techniques
Chapter 212
Torben Bach Pedersen, Jesper Thorhauge, Søren E. Jespersen
Enormous amounts of information about Web site user behavior are collected in Web server logs. However, this information is only useful if it can be... Sample PDF
Combining Data Warehousing and Data Mining Techniques for Web Log Analysis
Chapter 213
D. Xuan Le, J. Wenny Rahayu, David Taniar
This article proposes a data warehouse integration technique that combines data and documents from different underlying documents and database... Sample PDF
Web Data Warehousing Convergence: From Schematic to Systematic
Chapter 214
John M. Artz
Data warehousing is an emerging technology that greatly extends the capabilities of relational databases specifically in the analysis of very large... Sample PDF
Web Technologies and Data Warehousing Synergies
Chapter 215
Gilbert W. Laware
This chapter introduces the need for the World Wide Web to provide a standard mechanism so individuals can readily obtain data, reports, research... Sample PDF
Metadata Management: A Requirement for Web Warehousing and Knowledge Management
Chapter 216
Hanny Yulius Limanto, Tay Joc Cing, Andrew Watkins
With the recent introduction of third generation (3G) technology in the field of mobile commu-nications, mobile phone service providers will have to... Sample PDF
An Immune Systems Approach for Classifying Mobile Phone Usage
Chapter 217
Tiziana Catarci, Stephen Kimani, Stefano Lodi
Despite the existence of various data mining efforts that deal with user interface aspects, very few provide a formal specification of the syntax of... Sample PDF
User Interface Formalization in Visual Data Mining
Chapter 218
Junmei Wang, Wynne Hsu, Mong Li Lee
Recent interest in spatio-temporal applications has been fueled by the need to discover and predict complex patterns that occur when we observe the... Sample PDF
Mining in Spatio-Temporal Databases
Chapter 219
Zhong Qu
Image reconstruction is one of the key technologies in industrial computed tomography. In this paper, an efficient iterative image reconstruction... Sample PDF
Algebraic Reconstruction Technique in Image Reconstruction Based on Data Mining
Chapter 220
Marek Kretowski, Marek Grzes
This article presents a new evolutionary algorithm (EA) for induction of mixed decision trees. In nonterminal nodes of a mixed tree, different types... Sample PDF
Evolutionary Induction of Mixed Decision Trees
Chapter 221
Semantic Data Mining  (pages 3524-3530)
Protima Banerjee, Xiaohua Hu, Illhio Yoo
Over the past few decades, data mining has emerged as a field of research critical to understanding and assimilating the large stores of data... Sample PDF
Semantic Data Mining
Chapter 222
Marie Aude Aufaure, Bénédicte Le Grand, Michel Soto, Nacera Bennacer
The increasing volume of data available on the Web makes information retrieval a tedious and difficult task. The vision of the Semantic Web... Sample PDF
Metadata- and Ontology-Based Semantic Web Mining
Chapter 223
Honghua Dai, Bamshad Mobasher
Web usage mining has been used effectively as an approach to automatic personalization and as a way to overcome deficiencies of traditional... Sample PDF
Integrating Semantic Knowledge with Web Usage Mining for Personalization
Chapter 224
Mining in Music Databases  (pages 3586-3610)
Ioannis Karydis, Alexandros Nanopoulos, Yannis Manolopoulos
This chapter provides a broad survey of music data mining, including clustering, classification and pattern discovery in music. The data studied is... Sample PDF
Mining in Music Databases
Chapter 225
Janusz Swierzowicz
The development of information technology is particularly noticeable in the methods and techniques of data acquisition, high-performance computing... Sample PDF
Multimedia Data Mining Concept
Chapter 226
Brian C. Lovell, Shaokang Chen
While the technology for mining text documents in large databases could be said to be relatively mature, the same cannot be said for mining other... Sample PDF
Robust Face Recognition for Data Mining
Chapter 227
Jeffrey W. Seifert
A significant amount of attention appears to be focusing on how to better collect, analyze, and disseminate information. In doing so, technology is... Sample PDF
Data Mining and Homeland Security
Chapter 228
Bhavani Thuraisingham
Data mining is the process of posing queries to large quantities of data and extracting information often previously unknown using mathematical... Sample PDF
Homeland Security Data Mining and Link Analysis
Chapter 229
Gerasimos Marketos, Yannis Theodoridis, Ioannis S. Kalogeras
Earthquake data composes an ever increasing collection of earth science information for post-processing analysis. Earth scientists, local or... Sample PDF
Seismological Data Warehousing and Mining: A Survey
Chapter 230
Adam Fadlalla, Nilmini Wickramasinghe
This chapter provides insight into various areas within the medical field that strive to take advantage of different data mining techniques in order... Sample PDF
Realizing Knowledge Assets in the Medical Sciences with Data Mining: An Overview
Chapter 231
Mining Clinical Trial Data  (pages 3675-3693)
Jose Ma. J. Alvir, Javier Cabrera, Frank Caridi, Ha Nguyen
Mining clinical trails is becoming an important tool for extracting information that might help design better clinical trials. One important... Sample PDF
Mining Clinical Trial Data
Chapter 232
William Perrizo, Qiang Ding, Masum Serazi, Taufik Abidin, Baoying Wang
For several decades and especially with the preeminence of relational database systems, data is almost always formed into horizontal record... Sample PDF
Vertical Database Design for Scalable Data Mining