Data Driven vs. Metric Driven Data Warehouse Design

Data Driven vs. Metric Driven Data Warehouse Design

John M. Artz (The George Washington University, USA)
Copyright: © 2009 |Pages: 6
DOI: 10.4018/978-1-60566-010-3.ch060
OnDemand PDF Download:


Although data warehousing theory and technology have been around for well over a decade, they may well be the next hot technologies. How can it be that a technology sleeps for so long and then begins to move rapidly to the foreground? This question can have several answers. Perhaps the technology had not yet caught up to the theory or that computer technology 10 years ago did not have the capacity to delivery what the theory promised. Perhaps the ideas and the products were just ahead of their time. All these answers are true to some extent. But the real answer, I believe, is that data warehousing is in the process of undergoing a radical theoretical and paradigmatic shift, and that shift will reposition data warehousing to meet future demands.
Chapter Preview


Just recently I started teaching a new course in data warehousing. I have only taught it a few times so far, but I have already noticed that there are two distinct and largely incompatible views of the nature of a data warehouse. A prospective student, who had several years of industry experience in data warehousing but little theoretical insight, came by my office one day to find out more about the course. “Are you an Inmonite or a Kimballite?” she inquired, reducing the possibilities to the core issues. “Well, I suppose if you put it that way,” I replied, “I would have to classify myself as a Kimballite.” William Inmon (2000, 2002) and Ralph Kimball (1996, 1998, 2000) are the two most widely recognized authors in data warehousing and represent two competing positions on the nature of a data warehouse.

The issue that this student was trying to get at was whether or not I viewed the dimensional data model as the core concept in data warehousing. I do, of course, but there is, I believe, a lot more to the emerging competition between these alternative views of data warehouse design. One of these views, which I call the data-driven view of data warehouse design, begins with existing organizational data. These data have more than likely been produced by existing transaction processing systems. They are cleansed and summarized and are used to gain greater insight into the functioning of the organization. The analysis that can be done is a function of the data that were collected in the transaction processing systems. This was, perhaps, the original view of data warehousing and, as will be shown, much of the current research in data warehousing assumes this view.

The competing view, which I call the metric-driven view of data warehouse design, begins by identifying key business processes that need to be measured and tracked over time in order for the organization to function more efficiently. A dimensional model is designed to facilitate that measurement over time, and data are collected to populate that dimensional model. If existing organizational data can be used to populate that dimensional model, so much the better. But if not, the data need to be acquired somehow. The metric-driven view of data warehouse design, as will be shown, is superior both theoretically and philosophically. In addition, it dramatically changes the research program in data warehousing. The metric-driven and data-driven approaches to data warehouse design have also been referred to, respectively, as metric pull versus data push (Artz, 2003).

Complete Chapter List

Search this Book:
Editorial Advisory Board
Contents by Volume
Contents by Topic
Alexander Tuzhilin
John Wang
Chapter 1
Action Rules Mining  (pages 1-5)
Zbigniew W. Ras, Elzbieta Wyrzykowska, Li-Shiang Tsay
There are two aspects of interestingness of rules that have been studied in data mining literature, objective and subjective measures (Liu et al.... Sample PDF
Action Rules Mining
Chapter 2
Ion Muslea
Inductive learning algorithms typically use a set of labeled examples to learn class descriptions for a set of user-specified concepts of interest.... Sample PDF
Active Learning with Multiple Views
Chapter 3
Xueping Li
The Internet has become a popular medium to disseminate information and a new platform to conduct electronic business (e-business) and electronic... Sample PDF
Adaptive Web Presence and Evolution through Web Log Analysis
Chapter 4
Hadrian Peter
Data warehouses have established themselves as necessary components of an effective IT strategy for large businesses. To augment the streams of data... Sample PDF
Aligning the Warehouse and the Web
Chapter 5
Dan Zhu
With the advent of technology, information is available in abundance on the World Wide Web. In order to have appropriate and useful information... Sample PDF
Analytical Competition for Managing Customer Relations
Chapter 6
Chun-Che Huang, Tzu-Liang ("Bill") Tseng
The Information Technology and Internet techniques are rapidly developing. Interaction between enterprises and customers has dramatically changed.... Sample PDF
Analytical Knowledge Warehousing for Business Intelligence
Chapter 7
Lisa Friedland
In traditional data analysis, data points lie in a Cartesian space, and an analyst asks certain questions: (1) What distribution can I fit to the... Sample PDF
Anomaly Detection for Inferring Social Structure
Chapter 8
J. Ben Schafer
In a world where the number of choices can be overwhelming, recommender systems help users find and evaluate items of interest. They connect users... Sample PDF
The Application of Data-Mining to Recommender Systems
Chapter 9
Gustavo Camps-Valls, Manel Martínez-Ramón, José Luis Rojo-Álvarez
In this chapter, we give a survey of applications of the kernel methods introduced in the previous chapter. We focus on different application... Sample PDF
Applications of Kernel Methods
Chapter 10
Sandra Elizabeth González Císaro
Much information stored in current databases is not always present at necessary different levels of detail or granularity for Decision-Making... Sample PDF
Architecture for Symbolic Object Warehouse
Chapter 11
Wenxue Huang, Milorad Krneta, Limin Lin, Jianhong Wu
An association pattern describes how a group of items (for example, retail products) are statistically associated together, and a meaningful... Sample PDF
Association Bundle Identification
Chapter 12
Vassilios S. Verykios
The enormous expansion of data collection and storage facilities has created an unprecedented increase in the need for data analysis and processing... Sample PDF
Association Rule Hiding Methods
Chapter 13
Yew-Kwong Woon
Association Rule Mining (ARM) is concerned with how items in a transactional database are grouped together. It is commonly known as market basket... Sample PDF
Association Rule Mining
Chapter 14
Luminita Dumitriu
The concept of Quantitative Structure-Activity Relationship (QSAR), introduced by Hansch and co-workers in the 1960s, attempts to discover the... Sample PDF
On Association Rule Mining for the QSAR Problem
Chapter 15
Anne Denton
Most data of practical relevance are structured in more complex ways than is assumed in traditional data mining algorithms, which are based on a... Sample PDF
Association Rule Mining of Relational Data
Chapter 16
Martine Cadot, Jean-Baptiste Maj, Tarek Ziadé
A manager would like to have a dashboard of his company without manipulating data. Usually, statistics have solved this challenge, but nowadays... Sample PDF
Association Rules and Statistics
Chapter 17
Zheng-Hua Tan
The explosive increase in computing power, network bandwidth and storage capacity has largely facilitated the production, transmission and storage... Sample PDF
Audio and Speech Processing for Data Mining
Chapter 18
Audio Indexing  (pages 104-109)
Gaël Richard
The enormous amount of unstructured audio data available nowadays and the spread of its use as a data source in many applications are introducing... Sample PDF
Audio Indexing
Chapter 19
Jamel Feki
Within today’s competitive economic context, information acquisition, analysis and exploitation became strategic and unavoidable requirements for... Sample PDF
An Automatic Data Warehouse Conceptual Design Approach
Chapter 20
Xiaoyan Yu, Manas Tungare, Weiguo Fan, Manuel Pérez-Quiñones, Edward A. Fox, William Cameron, Lillian Cassel
Starting with a vast number of unstructured or semistructured documents, text mining tools analyze and sift through them to present to users more... Sample PDF
Automatic Genre-Specific Text Classification
Chapter 21
Xin Zhang
Music information indexing based on timbre helps users to get relevant musical data in large digital music databases. Timbre is a quality of sound... Sample PDF
Automatic Music Timbre Indexing
Chapter 22
Shu-Chiang Lin
Many task analysis techniques and methods have been developed over the past decades, but identifying and decomposing a user’s task into small task... Sample PDF
A Bayesian Based Machine Learning Application to Task Analysis
Chapter 23
Yinghui Yang
Customer segmentation is the process of dividing customers into distinct subsets (segments or clusters) that behave in the same way or have similar... Sample PDF
Behavioral Pattern-Based Customer Segmentation
Chapter 24
Les Pang
Data warehousing has been a successful approach for supporting the important concept of knowledge management— one of the keys to organizational... Sample PDF
Best Practices in Data Warehousing
Chapter 25
Scott Nicholson
Most people think of a library as the little brick building in the heart of their community or the big brick building in the center of a college... Sample PDF
Bibliomining for Library Decision-Making
Chapter 26
Gustavo Camps-Valls, Alistair Morgan Chalk
Bioinformatics is a new, rapidly expanding field that uses computational approaches to answer biological questions (Baxevanis, 2005). These... Sample PDF
Bioinformatics and Computational Biology
Chapter 27
Jieping Ye, Ravi Janardan, Sudhir Kumar
Understanding the roles of genes and their interactions is one of the central challenges in genome research. One popular approach is based on the... Sample PDF
Biological Image Analysis via Matrix Approximation
Chapter 28
Ladjel Bellatreche
Scientific databases and data warehouses store large amounts of data ith several tables and attributes. For instance, the Sloan Digital Sky Survey... Sample PDF
Bitmap Join Indexes vs. Data Partitioning
Chapter 29
Lei Tang, Huan Liu, Jiangping Zhang
The unregulated and open nature of the Internet and the explosive growth of the Web create a pressing need to provide various services for content... Sample PDF
Bridging Taxonomic Semantics to Accurate Hierarchical Classification
Chapter 30
Arla Juntunen
The high level objectives of public authorities are to create value at minimal cost, and achieve ongoing support and commitment from its funding... Sample PDF
A Case Study of a Data Warehouse in the Finnish Police
Chapter 31
Johannes Gehrke
It is the goal of classification and regression to build a data mining model that can be used for prediction. To construct such a model, we are... Sample PDF
Classification and Regression Trees
Chapter 32
Classification Methods  (pages 196-201)
Aijun An
Generally speaking, classification is the action of assigning an object to a category according to the characteristics of the object. In data... Sample PDF
Classification Methods
Chapter 33
Andrzej Dominik
Classification is a classical and fundamental data mining (machine learning) task in which individual items (objects) are divided into groups... Sample PDF
Classification of Graph Structures
Chapter 34
Xinghua Fan
Text categorization (TC) is a task of assigning one or multiple predefined category labels to natural language texts. To deal with this... Sample PDF
Classifying Two-Class Chinese Texts in Two Steps
Chapter 35
Frank Klawonn, Frank Rehm
For many applications in knowledge discovery in databases finding outliers, rare events, is of importance. Outliers are observations, which deviate... Sample PDF
Cluster Analysis for Outlier Detection
Chapter 36
Tom Burr
One data mining activity is cluster analysis, which consists of segregating study units into relatively homogeneous groups. There are several types... Sample PDF
Cluster Analysis in Fitting Mixtures of Curves
Chapter 37
Dingxi Qiu, Edward C. Malthouse
Cluster analysis is a set of statistical models and algorithms that attempt to find “natural groupings” of sampling units (e.g., customers, survey... Sample PDF
Cluster Analysis with General Latent Class Model
Chapter 38
Cluster Validation  (pages 231-236)
Ricardo Vilalta, Tomasz Stepinski
Spacecrafts orbiting a selected suite of planets and moons of our solar system are continuously sending long sequences of data back to Earth. The... Sample PDF
Cluster Validation
Chapter 39
Athman Bouguettaya, Qi Yu
Clustering analysis has been widely applied in diverse fields such as data mining, access structures, knowledge discovery, software engineering... Sample PDF
Clustering Analysis of Data with High Dimensionality
Chapter 40
Joshua Zhexue Huang
A lot of data in real world databases are categorical. For example, gender, profession, position, and hobby of customers are usually defined as... Sample PDF
Clustering Categorical Data with k-Modes
Chapter 41
Mei Li, Wang-Chien Lee
With the advances in network communication, many large scale network systems have emerged. Peer-topeer (P2P) systems, where a large number of nodes... Sample PDF
Clustering Data in Peer-to-Peer Systems
Chapter 42
Anne Denton
Time series data is of interest to most science and engineering disciplines and analysis techniques have been developed for hundreds of years. There... Sample PDF
Clustering of Time Series Data
Chapter 43
On Clustering Techniques  (pages 264-268)
Sheng Ma, Tao Li
Clustering data into sensible groupings, as a fundamental and effective tool for efficient data organization, summarization, understanding and... Sample PDF
On Clustering Techniques
Chapter 44
Richard S. Segall
This chapter discusses four-selected software for data mining that are not available as free open-source software. The four-selected software for... Sample PDF
Comparing Four-Selected Data Mining Software
Chapter 45
Eamonn Keogh, Li Keogh, John C. Handley
Compression-based data mining is a universal approach to clustering, classification, dimensionality reduction, and anomaly detection. It is... Sample PDF
Compression-Based Data Mining
Chapter 46
Amin A. Abdulghani
The focus of online analytical processing (OLAP) is to provide a platform for analyzing data (e.g., sales data) with multiple dimensions (e.g.... Sample PDF
Computation of OLAP Data Cubes
Chapter 47
Elzbieta Malinowski, Esteban Zimányi
The advantages of using conceptual models for database design are well known. In particular, they facilitate the communication between users and... Sample PDF
Conceptual Modeling for Data Warehouse and OLAP Applications
Chapter 48
Constrained Data Mining  (pages 301-306)
Brad Morantz
Mining a large data set can be time consuming, and without constraints, the process could generate sets or rules that are invalid or redundant. Some... Sample PDF
Constrained Data Mining
Chapter 49
Carson Kai-Sang Leung
The problem of association rule mining was introduced in 1993 (Agrawal et al., 1993). Since then, it has been the subject of numerous studies. Most... Sample PDF
Constraint-Based Association Rule Mining
Chapter 50
Francesco Bonchi
Devising fast and scalable algorithms, able to crunch huge amount of data, was for many years one of the main goals of data mining research. But... Sample PDF
Constraint-Based Pattern Discovery
Chapter 51
Alexander mirnov
Decisions in the modern world are often made in rapidly changing, sometimes unexpected, situations. Such situations require availability of systems... Sample PDF
Context-Driven Decision Mining
Chapter 52
Marko Robnik-Šikonja
The research in machine learning, data mining, and statistics has provided a number of methods that estimate the usefulness of an attribute... Sample PDF
Context-Sensitive Attribute Evaluation
Chapter 53
Yi-Cheng Tu, Gang Ding
Database administration (tuning) is the process of adjusting database configurations in order to accomplish desirable performance goals. This job is... Sample PDF
Control-Based Database Tuning Under Dynamic Workloads
Chapter 54
Cost-Sensitive Learning  (pages 339-345)
Victor S. Sheng, Charles X. Ling
Classification is the most important task in inductive learning and machine learning. A classifier can be trained from a set of training examples... Sample PDF
Cost-Sensitive Learning
Chapter 55
Kehan Gao, Taghi M. Khoshgoftaar
Timely and accurate prediction of the quality of software modules in the early stages of the software development life cycle is very important in... Sample PDF
Count Models for Software Quality Estimation
Chapter 56
Christine W. Chan
An economic evaluation of a new oil well is often required, and this evaluation depends heavily on how accurately production of the well can be... Sample PDF
Data Analysis for Oil Production Prediction
Chapter 57
Seunghyun Im, Zbigniew W. Ras
This article discusses data security in Knowledge Discovery Systems (KDS). In particular, we presents the problem of confidential data... Sample PDF
Data Confidentiality and Chase-Based Knowledge Discovery
Chapter 58
Alfredo Cuzzocrea
OnLine Analytical Processing (OLAP) research issues (Gray, Chaudhuri, Bosworth, Layman, Reichart & Venkatrao, 1997) such as data cube modeling... Sample PDF
Data Cube Compression Techniques: A Theoretical Review
Chapter 59
Junjie Wu, Jian Chen, Hui Xiong
Cluster analysis (Jain & Dubes, 1988) provides insight into the data by dividing the objects into groups (clusters), such that objects in a cluster... Sample PDF
A Data Distribution View of Clustering Algorithms
Chapter 60
John M. Artz
Although data warehousing theory and technology have been around for well over a decade, they may well be the next hot technologies. How can it be... Sample PDF
Data Driven vs. Metric Driven Data Warehouse Design
Chapter 61
Data Mining and Privacy  (pages 388-393)
Esma Aïmeur
With the emergence of Internet, it is now possible to connect and access sources of information and databases throughout the world. At the same... Sample PDF
Data Mining and Privacy
Chapter 62
Paola Cerchiello
The aim of this contribution is to show one of the most important application of text mining. According to a wide part of the literature regarding... Sample PDF
Data Mining and the Text Categorization Framework
Chapter 63
Joaquín Ordieres-Meré, Manuel Castejón-Limas, Ana González-Marcos
The industrial plants, beyond subsisting, pursue to be leaders in increasingly competitive and dynamic markets. In this environment, quality... Sample PDF
Data Mining Applications in Steel Industry
Chapter 64
Soo Kim
Some people say that “success or failure often depends not only on how well you are able to collect data but also on how well you are able to... Sample PDF
Data Mining Applications in the Hospitality Industry
Chapter 65
Roberto Marmo
As a conseguence of expansion of modern technology, the number and scenario of fraud are increasing dramatically. Therefore, the reputation blemish... Sample PDF
Data Mining for Fraud Detection System
Chapter 66
Lior Rokach
In many modern manufacturing plants, data that characterize the manufacturing process are electronically collected and stored in the organization’s... Sample PDF
Data Mining for Improving Manufacturing Processes
Chapter 67
Luciana Dalla Valle
The term “internationalization” refers to the process of international expansion of firms realized through different mechanisms such as export... Sample PDF
Data Mining for Internationalization
Chapter 68
Silvia Figini
Customer lifetime value (LTV, see e.g. Bauer et al. 2005 and Rosset et al. 2003), which measures the profit generating potential, or value, of a... Sample PDF
Data Mining for Lifetime Value Estimation
Chapter 69
Diego Liberati
In many fields of research, as well as in everyday life, it often turns out that one has to face a huge amount of data, without an immediate grasp... Sample PDF
Data Mining for Model Identification
Chapter 70
Mª Dolores del Castillo
Email is now an indispensable communication tool and its use is continually growing. This growth brings with it an increase in the number of... Sample PDF
Data Mining for Obtaining Secure E-Mail Communications
Chapter 71
Ramdev Kanapady, Aleksandar Lazarevic
Structural health monitoring denotes the ability to collect data about critical engineering structural elements using various sensors and to detect... Sample PDF
Data Mining for Structural Health Monitoring
Chapter 72
Ng Yew Seng, Rajagopalan Srinivasan
Advancements in sensors and database technologies have resulted in the collection of huge amounts of process data from chemical plants. A number of... Sample PDF
Data Mining for the Chemical Process Industry
Chapter 73
Tom Burr
The genetic basis for some human diseases, in which one or a few genome regions increase the probability of acquiring the disease, is fairly well... Sample PDF
Data Mining in Genome Wide Association Studies
Chapter 74
Haipeng Wang
Protein identification (sequencing) by tandem mass spectrometry is a fundamental technique for proteomics which studies structures and functions of... Sample PDF
Data Mining in Protein Identification by Tandem Mass Spectrometry
Chapter 75
Aleksandar Lazarevic
In recent years, research in many security areas has gained a lot of interest among scientists in academia, industry, military and governmental... Sample PDF
Data Mining in Security Applications
Chapter 76
Gary Weiss
The telecommunications industry was one of the first to adopt data mining technology. This is most likely because telecommunication companies... Sample PDF
Data Mining in the Telecommunications Industry
Chapter 77
Les Pang
Data mining has been a successful approach for improving the level of business intelligence and knowledge management throughout an organization.... Sample PDF
Data Mining Lessons Learned in the Federal Government
Chapter 78
Seung Ki Moon
Many companies strive to maximize resource utilization by sharing and reusing distributed design knowledge and information when developing new... Sample PDF
A Data Mining Methodology for Product Family Design
Chapter 79
Data Mining on XML Data  (pages 506-510)
Qin Ding
With the growing usage of XML data for data storage and exchange, there is an imminent need to develop efficient algorithms to perform data mining... Sample PDF
Data Mining on XML Data
Chapter 80
Christophe Giraud-Carrier
It is sometimes argued that all one needs to engage in Data Mining (DM) is data and a willingness to “give it a try.” Although this view is... Sample PDF
Data Mining Tool Selection
Chapter 81
Amin A. Abdulghani
A lot of interest has been expressed in database mining using association rules (Agrawal, Imielinski, & Swami, 1993). In this chapter, we provide a... Sample PDF
Data Mining with Cubegrades
Chapter 82
Hai Wang, Shouhong Wang
Survey is one of the common data acquisition methods for data mining (Brin, Rastogi & Shim, 2003). In data mining one can rarely find a survey data... Sample PDF
Data Mining with Incomplete Data
Chapter 83
Mohammed Alshalalfa
Data mining can be described as data processing using sophisticated data search capabilities and statistical algorithms to discover patterns and... Sample PDF
Data Pattern Tutor for AprioriAll and PrefixSpan
Chapter 84
Magdi Kamel
Practical experience of data mining has revealed that preparing data is the most time-consuming phase of any data mining project. Estimates of the... Sample PDF
Data Preparation for Data Mining
Chapter 85
Data Provenance  (pages 544-549)
Vikram Sorathia
In recent years, our sensing capability has increased manifold. The developments in sensor technology, telecommunication, computer networking and... Sample PDF
Data Provenance
Chapter 86
William E. Winkler
Fayyad and Uthursamy (2002) have stated that the majority of the work (representing months or years) in creating a data warehouse is in cleaning up... Sample PDF
Data Quality in Data Warehouses
Chapter 87
Richard Jensen
Data reduction is an important step in knowledge discovery from data. The high dimensionality of databases can be reduced using suitable techniques... Sample PDF
Data Reduction with Rough Sets
Chapter 88
Data Streams  (pages 561-565)
João Gama, Pedro Pereira Rodrigues
Nowadays, data bases are required to store massive amounts of data that are continuously inserted, and queried. Organizations use decision support... Sample PDF
Data Streams
Chapter 89
Amitava Mitra
As the abundance of collected data on products, processes and service-related operations continues to grow with technology that facilitates the ease... Sample PDF
Data Transformation for Normalization
Chapter 90
Alkis Simitsis, Dimitri Theodoratos
The back-end tools of a data warehouse are pieces of software responsible for the extraction of data from several sources, their cleansing... Sample PDF
Data Warehouse Back-End Tools
Chapter 91
Beixin ("Betsy") Lin, Yu Hong, Zu-Hsu Lee
A data warehouse is a large electronic repository of information that is generated and updated in a structured manner by an enterprise over time to... Sample PDF
Data Warehouse Performance
Chapter 92
Richard Mathieu
Every finished product has gone through a series of transformations. The process begins when manufacturers purchase the raw materials that will be... Sample PDF
Data Warehousing and Mining in Supply Chains
Chapter 93
Yuefeng Li
With the phenomenal growth of electronic data and information, there are many demands for developments of efficient and effective systems (tools) to... Sample PDF
Data Warehousing for Association Mining
Chapter 94
Lutz Hamel
Modern, commercially available relational database systems now routinely include a cadre of data retrieval and analysis tools. Here we shed some... Sample PDF
Database Queries, Data Mining, and OLAP
Chapter 95
Patricia E.N. Lutu
In data mining, sampling may be used as a technique for reducing the amount of data presented to a data mining algorithm. Other strategies for data... Sample PDF
Database Sampling for Data Mining
Chapter 96
Edgar R. Weippl
In this article we will present an introduction to issues relevant to database security and statistical database security. We will briefly cover... Sample PDF
Database Security and Statistical Database Security
Chapter 97
Martin Žnidaršic, Marko Bohanec, Blaž Zupan
Computer models are representations of problem environment that facilitate analysis with high computing power and representation capabilities. They... Sample PDF
Data-Driven Revision of Decision Models
Chapter 98
Decision Tree Induction  (pages 624-630)
Roberta Siciliano, Claudio Conversano
Decision Tree Induction (DTI) is a tool to induce a classification or regression model from (usually large) datasets characterized by n objects... Sample PDF
Decision Tree Induction
Chapter 99
Monica Maceli, Min Song
With the increase in Web-based databases and dynamically- generated Web pages, the concept of the “deep Web” has arisen. The deep Web refers to Web... Sample PDF
Deep Web Mining through Web Services
Chapter 100
Matteo Golfarelli
Conceptual modeling is widely recognized to be the necessary foundation for building a database that is well-documented and fully satisfies the user... Sample PDF
DFM as a Conceptual Model for Data Warehouse
Chapter 101
Hanghang Tong, Yehuda Koren, Christos Faloutsos
In many graph mining settings, measuring node proximity is a fundamental problem. While most of existing measurements are (implicitly or explicitly)... Sample PDF
Direction-Aware Proximity on Graphs
Chapter 102
Takao Ito
One of the most important issues in data mining is to discover an implicit relationship between words in a large corpus and labels in a large... Sample PDF
Discovering an Effective Measure in Data Mining
Chapter 103
Richi Nayak
XML is the new standard for information exchange and retrieval. An XML document has a schema that defines the data definition and structure of the... Sample PDF
Discovering Knowledge from XML Documents
Chapter 104
Jan H Kroeze
A very large percentage of business and academic data is stored in textual format. With the exception of metadata, such as author, date, title and... Sample PDF
Discovering Unknown Patterns in Free Text
Chapter 105
William W. Agresti
It is routine to hear and read about the information explosion, how we are all overwhelmed with data and information. Is it progress when our search... Sample PDF
Discovery Informatics from Data to Knowledge
Chapter 106
Haiquan Li, Jinyan Li, Xuechun Zhao
Physical interactions between proteins are important for many cellular functions. Since protein-protein interactions are mediated via their... Sample PDF
Discovery of Protein Interaction Sites
Chapter 107
Vladimír Bartík
Association rules are one of the most frequently used types of knowledge discovered from databases. The problem of discovering association rules was... Sample PDF
Distance-Based Methods for Association Rule Mining
Chapter 108
Mafruz Zaman Ashrafi
Data mining is an iterative and interactive process that explores and analyzes voluminous digital data to discover valid, novel, and meaningful... Sample PDF
Distributed Association Rule Mining
Chapter 109
Yu Chen, Wei-Shinn Ku
The information technology has revolutionized almost every facet of our lives. Government, commercial, and educational organizations depend on... Sample PDF
Distributed Data Aggregation Technology for Real-Time DDoS Attacks Detection
Chapter 110
Distributed Data Mining  (pages 709-715)
Grigorios Tsoumakas
The continuous developments in information and communication technology have recently led to the appearance of distributed computing environments... Sample PDF
Distributed Data Mining
Chapter 111
José Ignacio Serrano
Owing to the growing amount of digital information stored in natural language, systems that automatically process text are of crucial importance and... Sample PDF
Document Indexing Techniques for Text Mining
Chapter 112
Dynamic Data Mining  (pages 722-728)
Richard Weber
Since the First KDD Workshop back in 1989 when “Knowledge Mining” was recognized as one of the top 5 topics in future database research... Sample PDF
Dynamic Data Mining
Chapter 113
Chang-Chia Liu, W. Art Chaovalitwongse, Panos M. Pardalos, Basim M. Uthman
Neurologists typically study the brain activity through acquired biomarker signals such as Electroencephalograms (EEGs) which have been widely used... Sample PDF
Dynamical Feature Extraction from Brain Activity Time Series
Chapter 114
Efficient Graph Matching  (pages 736-743)
Diego Reforgiato Recupero
Application domains such as bioinformatics and web technology represent complex objects as graphs where nodes represent basic objects (i.e. atoms... Sample PDF
Efficient Graph Matching
Chapter 115
Xunkai Wei
As known to us, the cognition process is the instinct learning ability of the human being. This process is perhaps one of the most complex human... Sample PDF
Enclosing Machine Learning
Chapter 116
Daniel Crabtree
Web search engines help users find relevant web pages by returning a result set containing the pages that best match the user’s query. When the... Sample PDF
Enhancing Web Search through Query Expansion
Chapter 117
Ji-Rong Wen
Web query log is a type of file keeping track of the activities of the users who are utilizing a search engine. Compared to traditional information... Sample PDF
Enhancing Web Search through Query Log Mining
Chapter 118
Ji-Rong Wen
The Web is an open and free environment for people to publish and get information. Everyone on the Web can be either an author, a reader, or both.... Sample PDF
Enhancing Web Search through Web Structure Mining
Chapter 119
Nikunj C. Oza
Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods that leverage the power of multiple... Sample PDF
Ensemble Data Mining Methods
Chapter 120
Niall Rooney
The concept of ensemble learning has its origins in research from the late 1980s/early 1990s into combining a number of artificial neural networks... Sample PDF
Ensemble Learning for Regression
Chapter 121
Ethics of Data Mining  (pages 783-788)
Jack Cook
Decision makers thirst for answers to questions. As more data is gathered, more questions are posed: Which customers are most likely to respond... Sample PDF
Ethics of Data Mining
Chapter 122
Paolo Giudici
Several classes of computational and statistical methods for data mining are available. Each class can be parameterised so that models within the... Sample PDF
Evaluation of Data Mining Methods
Chapter 123
Ivan Bruha
A ‘traditional’ learning algorithm that can induce a set of decision rules usually represents a robust and comprehensive system that discovers a... Sample PDF
Evaluation of Decision Rules by Qualities for Decision-Making Systems
Chapter 124
Caitlin Kelly Maurie
Geospatial data and the technologies that drive them have altered the landscape of our understanding of the world around us. The data, software and... Sample PDF
The Evolution of SDI Geospatial Data Clearinghouses
Chapter 125
Amit Saxena, Megha Kothari, Navneet Pandey
Excess of data due to different voluminous storage and online devices has become a bottleneck to seek meaningful information therein and we are... Sample PDF
Evolutionary Approach to Dimensionality Reduction
Chapter 126
William H. Hsu
A genetic algorithm (GA) is a method used to find approximate solutions to difficult search, optimization, and machine learning problems (Goldberg... Sample PDF
Evolutionary Computation and Genetic Algorithms
Chapter 127
Laetitia Jourdan
Knowledge discovery from genomic data has become an important research area for biologists. Nowadays, a lot of data is available on the web, but the... Sample PDF
Evolutionary Data Mining for Genomics
Chapter 128
Daniel Rivero
Artificial Neural Networks (ANNs) are learning systems from the Artificial Intelligence (AI) world that have been used for solving complex problems... Sample PDF
Evolutionary Development of ANNs for Data Mining
Chapter 129
Jorge Muruzábal
Ensemble rule based classification methods have been popular for a while in the machine-learning literature (Hand, 1997). Given the advent of... Sample PDF
Evolutionary Mining of Rule Ensembles
Chapter 130
Yiyu Yao
The objective of data mining is to discover new and useful knowledge, in order to gain a better understanding of nature. This in fact is the goal of... Sample PDF
On Explanation-Oriented Data Mining
Chapter 131
Elzbieta Malinowski, Esteban Zimányi
Data warehouses keep large amounts of historical data in order to help users at different management levels to make more effective decisions.... Sample PDF
Extending a Conceptual Multidimensional Model for Representing Spatial Data
Chapter 132
Facial Recognition  (pages 857-862)
Rory A. Lewis, Zbigniew W. Ras
Over the past decade Facial Recognition has become more cohesive and reliable than ever before. We begin with an analysis explaining why certain... Sample PDF
Facial Recognition
Chapter 133
Seoung Bum Kim
Development of advanced sensing technology has multiplied the volume of spectral data, which is one of the most common types of data encountered in... Sample PDF
Feature Extraction/Selection in High-Dimensional Spectral Data
Chapter 134
Shouxian Cheng, Frank Y. Shih
The Support Vector Machine (SVM) (Cortes and Vapnik, 1995; Vapnik, 1995; Burges, 1998) is intended to generate an optimal separating hyperplane by... Sample PDF
Feature Reduction for Support Vector Machines
Chapter 135
Feature Selection  (pages 878-882)
Damien François
In many applications, like function approximation, pattern recognition, time series prediction, and data mining, one has to build a model relating... Sample PDF
Feature Selection
Chapter 136
Indranil Bose
Movement of stocks in the financial market is a typical example of financial time series data. It is generally believed that past performance of a... Sample PDF
Financial Time Series Data Mining
Chapter 137
Hong Shen
The discovery of association rules showing conditions of data co-occurrence has attracted the most attention in data mining. An example of an... Sample PDF
Flexible Mining of Association Rules
Chapter 138
Jamil M. Saquer
Formal concept analysis (FCA) is a branch of applied mathematics with roots in lattice theory (Wille, 1982; Ganter & Wille, 1999). It deals with the... Sample PDF
Formal Concept Analysis Based Clustering
Chapter 139
Xuan Hong Dang, Wee-Keong Ng, Kok-Leong Ong, Vincent Lee
In recent years, data streams have emerged as a new data type that has attracted much attention from the data mining community. They arise naturally... Sample PDF
Frequent Sets Mining in Data Stream Environments
Chapter 140
Eyke Hüllermeier
Tools and techniques that have been developed during the last 40 years in the field of fuzzy set theory (FST) have been applied quite successfully... Sample PDF
Fuzzy Methods in Data Mining
Chapter 141
Michel Schneider
Basically, the schema of a data warehouse lies on two kinds of elements: facts and dimensions. Facts are used to memorize measures about situations... Sample PDF
A General Model for Data Warehouses
Chapter 142
Ladjel Bellatreche
Decision support applications require complex queries, e.g., multi way joins defining on huge warehouses usually modelled using star schemas, i.e.... Sample PDF
A Genetic Algorithm for Selecting Horizontal Fragments
Chapter 143
Genetic Programming  (pages 926-931)
William H. Hsu
Genetic programming (GP) is a sub-area of evolutionary computation first explored by John Koza (1992) and independently developed by Nichael Lynn... Sample PDF
Genetic Programming
Chapter 144
Alex A. Freitas, Gisele L. Pappa
At present there is a wide range of data mining algorithms available to researchers and practitioners (Witten & Frank, 2005; Tan et al., 2006).... Sample PDF
Genetic Programming for Automatically Constructing Data Mining Algorithms
Chapter 145
Marek Kretowski, Marek Grzes
Decision trees are, besides decision rules, one of the most popular forms of knowledge representation in Knowledge Discovery in Databases process... Sample PDF
Global Induction of Decision Trees
Chapter 146
Graph-Based Data Mining  (pages 943-949)
Lawrence B. Holder
Graph-based data mining represents a collection of techniques for mining the relational aspects of data represented as a graph. Two major approaches... Sample PDF
Graph-Based Data Mining
Chapter 147
Graphical Data Mining  (pages 950-956)
Carol J. Romanowski
Data mining has grown to include many more data types than the “traditional” flat files with numeric or categorical attributes. Images, text, video... Sample PDF
Graphical Data Mining
Chapter 148
Liang Xiong
When we are faced with data, one common task is to learn the correspondence relationship between different data sets. More concretely, by learning... Sample PDF
Guide Manifold Alignment by Relative Comparisons
Chapter 149
Guided Sequence Alignment  (pages 964-969)
Abdullah N. Arslan
Sequence alignment is one of the most fundamental problems in computational biology. Ordinarily, the problem aims to align symbols of given... Sample PDF
Guided Sequence Alignment
Chapter 150
Benjamin C.M. Fung, Ke Wang, Martin Ester
Document clustering is an automatic grouping of text documents into clusters so that documents within a cluster have high similarity in comparison... Sample PDF
Hierarchical Document Clustering
Chapter 151
Francesco Buccafurri
Histograms are an important tool for data reduction both in the field of data-stream querying and in OLAP, since they allow us to represent large... Sample PDF
Histograms for OLAP and Data-Stream Queries
Chapter 152
Bhavani Thuraisingham
Data mining is the process of posing queries to large quantities of data and extracting information often previously unknown using mathematical... Sample PDF
Homeland Security Data Mining and Link Analysis
Chapter 153
Janet Delve
Data Warehousing is now a well-established part of the business and scientific worlds. However, up until recently, data warehouses were restricted... Sample PDF
Humanities Data Warehousing
Chapter 154
Sancho Salcedo-Sanz, Gustavo Camps-Valls, Carlos Bousoño-Calzón
Genetic algorithms (GAs) are a class of problem solving techniques which have been successfully applied to a wide variety of hard problems... Sample PDF
Hybrid Genetic Algorithms in Data Mining Applications
Chapter 155
Marvin L. Brown, John F. Kros
Missing or inconsistent data has been a pervasive problem in data analysis since the origin of data collection. The management of missing data in... Sample PDF
Imprecise Data and the Data Mining Process
Chapter 156
Incremental Learning  (pages 1006-1012)
Abdelhamid Bouchachia
Data mining and knowledge discovery is about creating a comprehensible model of the data. Such a model may take different forms going from simple... Sample PDF
Incremental Learning
Chapter 157
Seokkyung Chung
With the rapid growth of the World Wide Web, Internet users are now experiencing overwhelming quantities of online information. Since manually... Sample PDF
Incremental Mining from News Streams
Chapter 158
Honghua Dai
Inexact fielding learning (IFL) (Ciesieski & Dai, 1994; Dai & Ciesieski, 1994a, 1994b, 1995, 2004; Dai & Li, 2001) is a rough-set, theory-based... Sample PDF
Inexact Field Learning Approach for Data Mining
Chapter 159
Gary G. Yen
Scientific literatures can be organized to serve as a roadmap for researchers by pointing where and when the scientific community has been and is... Sample PDF
Information Fusion for Scientific Literature Classification
Chapter 160
Benjamin Griffiths
Rough Set Theory (RST), since its introduction in Pawlak (1982), continues to develop as an effective tool in data mining. Within a set theoretical... Sample PDF
Information Veins and Resampling with Rough Set Theory
Chapter 161
Instance Selection  (pages 1041-1045)
Huan Liu
The amounts of data become increasingly large in recent years as the capacity of digital data storage worldwide has significantly increased. As the... Sample PDF
Instance Selection
Chapter 162
Stephan Meisel
Basically, Data Mining (DM) and Operations Research (OR) are two paradigms independent of each other. OR aims at optimal solutions of decision... Sample PDF
Integration of Data Mining and Operations Research
Chapter 163
Andreas Koeller
Integration of data sources refers to the task of developing a common schema as well as data transformation solutions for a number of data sources... Sample PDF
Integration of Data Sources through Data Mining
Chapter 164
Sai Moturu
As John Muir noted, “When we try to pick out anything by itself, we find it hitched to everything else in the Universe” (Muir, 1911). In tune with... Sample PDF
Integrative Data Analysis for Biological Discovery
Chapter 165
P. Punitha, D.S. Guru
‘A visual idea is more powerful than verbal idea’, ‘A picture is worth more than ten thousand words’, ‘No words can convey what a picture speaks’... Sample PDF
Intelligent Image Archival and Retrieval System
Chapter 166
Intelligent Query Answering  (pages 1073-1078)
Zbigniew W. Ras, Agnieszka Dardzinska
One way to make Query Answering System (QAS) intelligent is to assume a hierarchical structure of its attributes. Such systems have been... Sample PDF
Intelligent Query Answering
Chapter 167
Zheng Zhao
The high dimensionality of data poses a challenge to learning tasks such as classification. In the presence of many irrelevant features... Sample PDF
On Interacting Features in Subset Selection
Chapter 168
On Interactive Data Mining  (pages 1085-1090)
Yan Zhao
Exploring and extracting knowledge from data is one of the fundamental problems in science. Data mining consists of important tasks, such as... Sample PDF
On Interactive Data Mining
Chapter 169
Interest Pixel Mining  (pages 1091-1096)
Qi Li, Jieping Ye, Chandra Kambhamettu
Visual media data such as an image is the raw data representation for many important applications, such as image retrieval (Mikolajczyk & Schmid... Sample PDF
Interest Pixel Mining
Chapter 170
Gustavo Camps-Valls, Manel Martínez-Ramón, José Luis Rojo-Álvarez
Machine learning has experienced a great advance in the eighties and nineties due to the active research in artificial neural networks and adaptive... Sample PDF
An Introduction to Kernel Methods
Chapter 171
Malcolm J. Beynon
The essence of data mining is to investigate for pertinent information that may exist in data (often large data sets). The immeasurably large amount... Sample PDF
The Issue of Missing Values in Data Mining
Chapter 172
Doina Caragea, Vasant Honavar
Recent advances in sensors, digital storage, computing and communications technologies have led to a proliferation of autonomously operated... Sample PDF
Knowledge Acquisition from Semantically Heterogeneous Data
Chapter 173
QingXiang Wu, Martin McGinnity, Girijesh Prasad, David Bell
Data mining and knowledge discovery aim at finding useful information from typically massive collections of data, and then extracting useful... Sample PDF
Knowledge Discovery in Databases with Diversity of Data Types
Chapter 174
Learning Bayesian Networks  (pages 1124-1128)
Marco F. Ramoni, Paola Sebastiani
Born at the intersection of artificial intelligence, statistics, and probability, Bayesian networks (Pearl, 1988) are a representation formalism at... Sample PDF
Learning Bayesian Networks
Chapter 175
Rallou Thomopoulos
This chapter deals with the problem of the cooperation of heterogeneous knowledge for the construction of a domain expertise, and more specifically... Sample PDF
Learning Exceptions to Refine a Domain Expertise
Chapter 176
Learning from Data Streams  (pages 1137-1141)
João Gama, Pedro Pereira Rodrigues
In the last two decades, machine learning research and practice has focused on batch learning usually with small datasets. In batch learning, the... Sample PDF
Learning from Data Streams
Chapter 177
Bojun Yan
As a recent emerging technique, semi-supervised clustering has attracted significant research interest. Compared to traditional clustering... Sample PDF
Learning Kernels for Semi-Supervised Clustering
Chapter 178
Feng Pan
As an essential dimension of our information space, time plays a very important role in every aspect of our lives. Temporal information is... Sample PDF
Learning Temporal Information from Text
Chapter 179
Abdelhamid Bouchachia
Recently the field of machine learning, pattern recognition, and data mining has witnessed a new research stream that is learning with partial... Sample PDF
Learning with Partial Supervision
Chapter 180
Kirsten Wahlstrom, John F. Roddick, Rick Sarre, Vladimir Estivill-Castro, Denise de Vries
To paraphrase Winograd (1992), we bring to our communities a tacit comprehension of right and wrong that makes social responsibility an intrinsic... Sample PDF
Legal and Technical Issues of Privacy Preservation in Data Mining
Chapter 181
Yinghui Yang, Balaji Padmanabhan
Classification is a form of data analysis that can be used to extract models to predict categorical class labels (Han & Kamber, 2001). Data... Sample PDF
Leveraging Unlabeled Data for Classification
Chapter 182
Carlotta Domeniconi, Dimitrios Gunopulos
Pattern classification is a very general concept with numerous applications ranging from science, engineering, target marketing, medical diagnosis... Sample PDF
Locally Adaptive Techniques for Pattern Classification
Chapter 183
Xiang Zhang, Seza Orcun, Mourad Ouzzani, Cheolhwan Oh
Systems biology aims to understand biological systems on a comprehensive scale, such that the components that make up the whole are connected to one... Sample PDF
Mass Informatics in Differential Proteomics
Chapter 184
Dimitri Theodoratos, Wugang Xu, Alkis Simitsis
A Data Warehouse (DW) is a repository of information retrieved from multiple, possibly heterogeneous, autonomous, distributed databases and other... Sample PDF
Materialized View Selection for Data Warehouse Design
Chapter 185
Jun Zhang, Jie Wang, Shuting Xu
Data mining technologies have now been used in commercial, industrial, and governmental businesses, for various purposes, ranging from increasing... Sample PDF
Matrix Decomposition Techniques for Data Privacy
Chapter 186
Raymond K. Pon, Alfonso F. Cardenas, David J. Buttler
An explosive growth of online news has taken place. Users are inundated with thousands of news articles, only some of which are interesting. A... Sample PDF
Measuring the Interestingness of News Articles
Chapter 187
Miguel García Torres
The Metaheuristics are general strategies for designing heuristic procedures with high performance. The term metaheuristic, which appeared in 1986... Sample PDF
Metaheuristics in Data Mining
Chapter 188
Meta-Learning  (pages 1207-1215)
Christophe Giraud-Carrier, Pavel Brazdil, Carlos Soares, Ricardo Vilalta
The application of Machine Learning (ML) and Data Mining (DM) tools to classification and regression tasks has become a standard, not only in... Sample PDF
Chapter 189
Xinghua Fan
Entity and relation recognition, i.e. assigning semantic classes (e.g., person, organization and location) to entities in a given sentence and... Sample PDF
A Method of Recognizing Entity and Relation
Chapter 190
Microarray Data Mining  (pages 1224-1230)
Li-Min Fu
Based on the concept of simultaneously studying the expression of a large number of genes, a DNA microarray is a chip on which numerous probes are... Sample PDF
Microarray Data Mining
Chapter 191
Diego Liberati
In everyday life, it often turns out that one has to face a huge amount of data, often not completely homogeneous, often without an immediate grasp... Sample PDF
Minimum Description Length Adaptive Bayesian Mining
Chapter 192
Li Shen, Fillia Makedon
Recent technological advances in 3D digitizing, noninvasive scanning, and interactive authoring have resulted in an explosive growth of 3D models in... Sample PDF
Mining 3D Shape Data for Morphometric Pattern Discovery
Chapter 193
Mining Chat Discussions  (pages 1243-1247)
Stanley Loh Daniel Licthnow, Thyago Borges Tiago Primo
According to Nonaka & Takeuchi (1995), the majority of the organizational knowledge comes from interactions between people. People tend to reuse... Sample PDF
Mining Chat Discussions
Chapter 194
Mining Data Streams  (pages 1248-1256)
Tamraparni Dasu, Gary Weiss
When a space shuttle takes off, tiny sensors measure thousands of data points every fraction of a second, pertaining to a variety of attributes like... Sample PDF
Mining Data Streams
Chapter 195
Gabriele Kern-Isberner
Knowledge discovery refers to the process of extracting new, interesting, and useful knowledge from data and presenting it in an intelligible way to... Sample PDF
Mining Data with Group Theoretical Means
Chapter 196
Mining Email Data  (pages 1262-1267)
Steffen Bickel
E-mail has become one of the most important communication media for business and private purposes. Large amounts of past e-mail records reside on... Sample PDF
Mining Email Data
Chapter 197
Wen-Yang Lin, Ming-Cheng Tseng
The mining of Generalized Association Rules (GARs) from a large transactional database in the presence of item taxonomy has been recognized as an... Sample PDF
Mining Generalized Association Rules in an Evolving Environment
Chapter 198
Doru Tanasa
Web Usage Mining (WUM) includes all the Data Mining techniques used to analyze the behavior of a Web site‘s users (Cooley, Mobasher & Srivastava... Sample PDF
Mining Generalized Web Data for Discovering Usage Patterns
Chapter 199
Mining Group Differences  (pages 1282-1286)
Shane M. Butler
Finding differences among two or more groups is an important data-mining task. For example, a retailer might want to know what the different is in... Sample PDF
Mining Group Differences
Chapter 200
Junsong Yuan
One of the focused themes in data mining research is to discover frequent and repetitive patterns from the data. The success of frequent pattern... Sample PDF
Mining Repetitive Patterns in Multimedia Data
Chapter 201
Bruno Agard
In large urban areas, smooth running public transit networks are key to viable development. Currently, economic and environmental issues are fueling... Sample PDF
Mining Smart Card Data from an Urban Transit Network
Chapter 202
David Lo
Software is a ubiquitous component in our daily life. It ranges from large software systems like operating systems to small embedded systems like... Sample PDF
Mining Software Specifications
Chapter 203
Ramon F. Brena, Ana Maguitman
The Internet has made available a big number of information services, such as file sharing, electronic mail, online chat, telephony and file... Sample PDF
Mining the Internet for Concepts
Chapter 204
Lutz Hamel
Classification models and in particular binary classification models are ubiquitous in many branches of science and business. Consider, for example... Sample PDF
Model Assessment with ROC Curves
Chapter 205
Modeling Quantiles  (pages 1324-1329)
Claudia Perlich, Saharon Rosset, Bianca Zadrozny
One standard Data Mining setting is defines by a set of n observations on a variable of interest Y and a set of p explanatory variables, or... Sample PDF
Modeling Quantiles
Chapter 206
Modeling Score Distributions  (pages 1330-1336)
Anca Doloc-Mihu
The goal of a web-based retrieval system is to find data items that meet a user’s request as fast and accurately as possible. Such a search engine... Sample PDF
Modeling Score Distributions
Chapter 207
Modeling the KDD Process  (pages 1337-1345)
Vasudha Bhatnagar, S. K. Gupta
Knowledge Discovery in Databases (KDD) is classically defined as the “nontrivial process of identifying valid, novel, potentially useful, and... Sample PDF
Modeling the KDD Process
Chapter 208
Pasquale De Meo, Giovanni Quattrone, Giorgio Terracina, Domenico Ursino
An Electronic-Service (E-Service) can be defined as a collection of network-resident software programs that collaborate for supporting users in both... Sample PDF
A Multi-Agent System for Handling Adaptive E-Services
Chapter 209
Chia Huey Ooi
Molecular classification involves the classification of samples into groups of biological phenotypes. Studies on molecular classification generally... Sample PDF
Multiclass Molecular Classification
Chapter 210
Omar Boussaid, Doulkifli Boukraa
While the classical databases aimed in data managing within enterprises, data warehouses help them to analyze data in order to drive their... Sample PDF
Multidimensional Modeling of Complex Data
Chapter 211
Fadime Üney Yüksektepe
Data classification is a supervised learning strategy that analyzes the organization and categorization of data in distinct classes. Generally, a... Sample PDF
Multi-Group Data Classification via MILP
Chapter 212
Amelia Zafra
The multiple-instance problem is a difficult machine learning problem that appears in cases where knowledge about training examples is incomplete.... Sample PDF
Multi-Instance Learning with MultiObjective Genetic Programming
Chapter 213
Multilingual Text Mining  (pages 1380-1385)
Peter A. Chew
The principles of text mining are fundamental to technology in everyday use. The world wide web (WWW) has in many senses driven research in text... Sample PDF
Multilingual Text Mining
Chapter 214
Gang Kou, Yi Peng, Yong Shi
Multiple criteria optimization seeks to simultaneously optimize two or more objective functions under a set of constraints. It has a great variety... Sample PDF
Multiple Criteria Optimization in Data Mining
Chapter 215
Sach Mukherjee
A number of important problems in data mining can be usefully addressed within the framework of statistical hypothesis testing. However, while the... Sample PDF
Multiple Hypothesis Testing for Data Mining
Chapter 216
Music Information Retrieval  (pages 1396-1402)
Alicja A. Wieczorkowska
Music information retrieval (MIR) is a multi-disciplinary research on retrieving information from music, see Fig. 1. This research involves... Sample PDF
Music Information Retrieval
Chapter 217
Ingrid Fischer
As the beginning of the area of artificial neural networks the introduction of the artificial neuron by McCulloch and Pitts is considered. They were... Sample PDF
Neural Networks and Graph Transformations
Chapter 218
Victor S.Y. Lo
Data mining has been widely applied in many areas over the past two decades. In marketing, many firms collect large amount of customer data to... Sample PDF
New Opportunities in Marketing Data Mining
Chapter 219
Dilip Kumar Pratihar
Most of the complex real-world systems involve more than three dimensions and it may be difficult to model these higher dimensional data related to... Sample PDF
Non-Linear Dimensionality Reduction Techniques
Chapter 220
Ioannis N. Kouris
Research in association rules mining has initially concentrated in solving the obvious problem of finding positive association rules; that is rules... Sample PDF
A Novel Approach on Negative Association Rules
Chapter 221
Indrani Chakravarty
The most commonly used protection mechanisms today are based on either what a person possesses (e.g. an ID card) or what the person remembers (like... Sample PDF
Offline Signature Recognition
Chapter 222
Alfredo Cuzzocrea, Svetlana Mansmann
The problem of efficiently visualizing multidimensional data sets produced by scientific and statistical tasks/ processes is becoming increasingly... Sample PDF
OLAP Visualization: Models, Issues, and Techniques
Chapter 223
Rebecca Boon-Noi Tan
Since its origin in the 1970’s research and development into databases systems has evolved from simple file storage and processing systems to... Sample PDF
Online Analytical Processing Systems
Chapter 224
Online Signature Recognition  (pages 1456-1462)
Indrani Chakravarty
Security is one of the major issues in today’s world and most of us have to deal with some sort of passwords in our daily lives; but, these... Sample PDF
Online Signature Recognition
Chapter 225
James Geller
The term “Ontology” was popularized in Computer Science by Thomas Gruber at the Stanford Knowledge Systems Lab (KSL). Gruber’s highly influential... Sample PDF
Ontologies and Medical Terminologies
Chapter 226
Order Preserving Data Mining  (pages 1470-1475)
Ioannis N. Kouris
Data mining has emerged over the last decade as probably the most important application in databases. To reproduce one of the most popular but... Sample PDF
Order Preserving Data Mining
Chapter 227
Outlier Detection  (pages 1476-1482)
Sharanjit Kaur
Knowledge discovery in databases (KDD) is a nontrivial process of detecting valid, novel, potentially useful and ultimately understandable patterns... Sample PDF
Outlier Detection
Chapter 228
Fabrizio Angiulli
Data mining techniques can be grouped in four main categories: clustering, classification, dependency detection, and outlier detection. Clustering... Sample PDF
Outlier Detection Techniques for Data Mining
Chapter 229
Jorge Cardoso, W.M.P. van der Aalst
Business process management systems (Smith and Fingar 2003) provide a fundamental infrastructure to define and manage business processes and... Sample PDF
Path Mining and Process Mining for Workflow Management Systems
Chapter 230
Andrew K.C. Wong, Yang Wang, Gary C.L. Li
A basic task of machine learning and data mining is to automatically uncover patterns that reflect regularities in a data set. When dealing with a... Sample PDF
Pattern Discovery as Event Association
Chapter 231
Hui Xiong, Michael Steinbach, Pang-Ning Tan, Vipin Kumar, Wenjun Zhou
Clustering and association analysis are important techniques for analyzing data. Cluster analysis (Jain & Dubes, 1988) provides insight into the... Sample PDF
Pattern Preserving Clustering
Chapter 232
P. Viswanath, Narasimha M. Murty, Bhatnagar Shalabh
Parametric methods first choose the form of the model or hypotheses and estimates the necessary parameters from the given dataset. The form, which... Sample PDF
Pattern Synthesis for Nonparametric Pattern Recognition
Chapter 233
C. Radha
An important problem in pattern recognition is that of pattern classification. The objective of classification is to determine a discriminant... Sample PDF
Pattern Synthesis in SVM Based Classifier
Chapter 234
Clifton Phua, Vincent Lee, Kate Smith-Miles
Almost every person has a life-long personal name which is officially recognised and has only one correct version in their language. Each personal... Sample PDF
The Personal Name Problem and a Data Mining Solution
Chapter 235
Konstantinos Kotis
Current keyword-based Web search engines (e.g. Googlea) provide access to thousands of people for billions of indexed Web pages. Although the amount... Sample PDF
Perspectives and Key Technologies of Semantic Web Search
Chapter 236
Nilmini Wickramasinghe, Rajeev K. Bali
Today’s economy is increasingly based on knowledge and information (Davenport, & Grover 2001). Knowledge is now recognized as the driver of... Sample PDF
A Philosophical Perspective on Knowledge Creation
Chapter 237
Ladjel Bellatreche, Mukesh Mohania
Recently, organizations have increasingly emphasized applications in which current and historical data are analyzed and explored comprehensively... Sample PDF
Physical Data Warehousing Design
Chapter 238
Xiao-Li Li
In traditional supervised learning, a large number of labeled positive and negative examples are typically required to learn an accurate classifier.... Sample PDF
Positive Unlabelled Learning for Document Classification
Chapter 239
D. R. Mani, Andrew L. Betz, James H. Drew
A structural conflict exists in businesses which sell services whose production costs are discontinuous and whose consumption is continuous but... Sample PDF
Predicting Resource Usage for Capital Efficient Marketing
Chapter 240
Seung-won Hwang
As near-infinite amount of data are becoming accessible on the Web, it becomes more important to support intelligent personalized retrieval... Sample PDF
Preference Modeling and Mining for Personalization
Chapter 241
Alfredo Cuzzocrea, Vincenzo Russo
The problem of ensuring the privacy and security of OLAP data cubes (Gray et al., 1997) arises in several fields ranging from advanced Data... Sample PDF
Privacy Preserving OLAP and OLAP Security
Chapter 242
Stanley R.M. Oliveira
Despite its benefits in various areas (e.g., business, medical analysis, scientific data analysis, etc), the use of data mining techniques can also... Sample PDF
Privacy-Preserving Data Mining
Chapter 243
Laura Maruster
As the on-line services and Web-based information systems proliferate in many domains of activities, it has become increasingly important to model... Sample PDF
Process Mining to Analyze the Behaviour of Specific Users
Chapter 244
Profit Mining  (pages 1598-1602)
Senqiang Zhou
A major obstacle in data mining applications is the gap between the statistic-based pattern extraction and the value-based decision-making. “Profit... Sample PDF
Profit Mining
Chapter 245
Ioannis N. Kouris
Software development has various stages, that can be conceptually grouped into two phases namely development and production (Figure 1). The... Sample PDF
Program Comprehension through Data Mining
Chapter 246
Minh Ngoc Ngo
Due to the need to reengineer and migrating aging software and legacy systems, reverse engineering has started to receive some attention. It has now... Sample PDF
Program Mining Augmented with Empirical Properties
Chapter 247
Ping Deng, Qingkai Ma, Weili Wu
Clustering can be considered as the most important unsupervised learning problem. It has been discussed thoroughly by both statistics and database... Sample PDF
Projected Clustering for Biological Data Analysis
Chapter 248
Imad Khoury, Godfried Toussaint, Antonio Ciampi, Isadora Antoniano
Clustering is considered the most important aspect of unsupervised learning in data mining. It deals with finding structure in a collection of... Sample PDF
Proximity-Graph-Based Tools for DNA Clustering
Chapter 249
Yang Xiang
Graphical models such as Bayesian networks (BNs) (Pearl, 1988; Jensen & Nielsen, 2007) and decomposable Markov networks (DMNs) (Xiang, Wong., &... Sample PDF
Pseudo-Independent Models and Decision Theoretic Knowledge Discovery
Chapter 250
Wen-Chi Hou
Mining market basket data (Agrawal et al. 1993, Agrawal et al. 1994) has received a great deal of attention in the recent past, partly due to its... Sample PDF
Quality of Association Rules by Chi-Squared Test
Chapter 251
Andrew Hamilton-Wright, Daniel W. Stashuk
A great deal of interesting real-world data is encountered through the analysis of continuous variables, however many of the robust tools for rule... Sample PDF
Quantization of Continuous Data for Pattern Based Rule Extraction
Chapter 252
Colin Cooper, Michele Zito
The association rule mining (ARM) problem is a wellestablished topic in the field of knowledge discovery in databases. The problem addressed by ARM... Sample PDF
Realistic Data for Testing Rule Mining Algorithms
Chapter 253
Brian C. Lovell, Shaokang Chen, Ting Shan
Data mining is widely used in various areas such as finance, marketing, communication, web service, surveillance and security. The continuing growth... Sample PDF
Real-Time Face Detection and Classification for ICCTV
Chapter 254
Marzena Kryszkiewicz
Discovering of frequent patterns in large databases is an important data mining problem. The problem was introduced in (Agrawal, Imielinski & Swami... Sample PDF
Reasoning about Frequent Patterns with Negation
Chapter 255
Nicolas Lachiche
Receiver Operating Characteristic (ROC curves) have been used for years in decision making from signals, such as radar or radiology. Basically they... Sample PDF
Receiver Operating Characteristic (ROC) Analysis
Chapter 256
Juha Kontio
Reporting is one of the basic processes in all organizations. Reports should offer relevant information for guiding the decision-making. Reporting... Sample PDF
Reflecting Reporting Problems and Data Warehousing
Chapter 257
Brian C. Lovell, Shaokang Chen, Ting Shan
While the technology for mining text documents in large databases could be said to be relatively mature, the same cannot be said for mining other... Sample PDF
Robust Face Recognition for Data Mining
Chapter 258
Rough Sets and Data Mining  (pages 1696-1701)
Jerzy W. Grzymala-Busse, Wojciech Ziarko
Discovering useful models capturing regularities of natural phenomena or complex systems until recently was almost entirely limited to finding... Sample PDF
Rough Sets and Data Mining
Chapter 259
Gautam Das
In recent years, advances in data collection and management technologies have led to a proliferation of very large databases. These large data... Sample PDF
Sampling Methods in Approximate Query Answering Systems
Chapter 260
V. Suresh Babu, P. Viswanath, Narasimha M. Murty
Non-parametric methods like the nearest neighbor classifier (NNC) and the Parzen-Window based density estimation (Duda, Hart & Stork, 2000) are more... Sample PDF
Scalable Non-Parametric Methods for Large Data Sets
Chapter 261
Scientific Web Intelligence  (pages 1714-1719)
Mike Thelwall
Scientific Web Intelligence (SWI) is a research field that combines techniques from data mining, web intelligence and scientometrics to extract... Sample PDF
Scientific Web Intelligence
Chapter 262
Päivikki Parpola
Some parts of this text, namely “Co-operative Building, Adaptation, and Evolution of Abstract Models of a KB” and most subsections in “Performing... Sample PDF
Seamless Structured Knowledge Acquisition
Chapter 263
Hadrian Peter
Over the past ten years or so data warehousing has emerged as a new technology in the database environment. “A data warehouse is a global repository... Sample PDF
Search Engines and their Impact on Data Warehouses
Chapter 264
Nils Pharo
Several studies of Web information searching (Agosto, 2002, Pharo & Järvelin, 2006, Prabha et al. 2007) have pointed out that searchers tend to... Sample PDF
Search Situations and Transitions
Chapter 265
Shuguo Han
Rapid advances in automated data collection tools and data storage technology have led to the wide availability of huge amount of data. Data mining... Sample PDF
Secure Building Blocks for Data Privacy
Chapter 266
Yehuda Lindell
The increasing use of data mining tools in both the public and private sectors raises concerns regarding the potentially sensitive nature of much of... Sample PDF
Secure Computation for Privacy Preserving Data Mining
Chapter 267
Parvathi Chundi, Daniel J. Rosenkrantz
Time series data is usually generated by measuring and monitoring applications, and accounts for a large fraction of the data available for analysis... Sample PDF
Segmentation of Time Series Data
Chapter 268
Yawei Wang
The graying of America is one of the most significant demographic changes to the present and future of the United States (Moisey & Bichis, 1999). As... Sample PDF
Segmenting the Mature Travel Market with Data Mining Tools
Chapter 269
Semantic Data Mining  (pages 1765-1770)
Protima Banerjee
Over the past few decades, data mining has emerged as a field of research critical to understanding and assimilating the large stores of data... Sample PDF
Semantic Data Mining
Chapter 270
Chrisa Tsinaraki
Several consumer electronic devices that allow capturing digital multimedia content (like mp3 recorders, digital cameras, DVD camcorders, smart... Sample PDF
Semantic Multimedia Content Retrieval and Filtering
Chapter 271
Ludovic Denoyer
Document classification developed over the last ten years, using techniques originating from the pattern recognition and machine learning... Sample PDF
Semi-Structured Document Classification
Chapter 272
Semi-Supervised Learning  (pages 1787-1793)
Tobias Scheffer
For many classification problems, unlabeled training data are inexpensive and readily available, whereas labeling training data imposes costs.... Sample PDF
Semi-Supervised Learning
Chapter 273
Cane W.K. Leung
Sentiment analysis is a kind of text classification that classifies texts based on the sentimental orientation (SO) of opinions they contain.... Sample PDF
Sentiment Analysis of Product Reviews
Chapter 274
Sequential Pattern Mining  (pages 1800-1805)
Florent Masseglia, Maguelonne Teisseire, Pascal Poncelet
Sequential pattern mining deals with data represented as sequences (a sequence contains sorted sets of items). Compared to the association rule... Sample PDF
Sequential Pattern Mining
Chapter 275
K. G. Srinivasa, K. R. Venugopal, L. M. Patnaik
Efficient tools and algorithms for knowledge discovery in large data sets have been devised during the recent years. These methods exploit the... Sample PDF
Soft Computing for XML Data Mining
Chapter 276
Liping Jing, Michael K. Ng, Joshua Zhexue Huang
High dimensional data is a phenomenon in real-world data mining applications. Text data is a typical example. In text mining, a text document is... Sample PDF
Soft Subspace Clustering for High-Dimensional Data
Chapter 277
Seoung Bum Kim, Chivalai Temiyasathit, Sun-Kyoung Park, Victoria C.P. Chen
Vast amounts of data are being generated to extract implicit patterns of ambient air pollution. Because air pollution data are generally collected... Sample PDF
Spatio-Temporal Data Mining for Air Pollution Problems
Chapter 278
Wenyuan Li
With the rapid growth of the World Wide Web and the capacity of digital data storage, tremendous amount of data are generated daily from business... Sample PDF
Spectral Methods for Data Clustering
Chapter 279
Christophe Giraud-Carrier
With the growth and wide availability of the Internet, most retailers have successfully added the Web to their other, more traditional distribution... Sample PDF
Stages of Knowledge Discovery in E-Commerce Sites
Chapter 280
Statistical Data Editing  (pages 1835-1840)
Claudio Conversano, Roberta Siciliano
Statistical Data Editing (SDE) is the process of checking and correcting data for errors. Winkler (1999) defines it the set of methods used to edit... Sample PDF
Statistical Data Editing
Chapter 281
Maria Vardaki
The term metadata is frequently used in many different sciences. Statistical metadata generally used to denote “every piece of information required... Sample PDF
Statistical Metadata Modeling and Transformations
Chapter 282
Concetto Elvio Bonafede
A statistical model is a possible representation (not necessarily complex) of a situation of the real world. Models are useful to give a good... Sample PDF
Statistical Models for Operational Risk
Chapter 283
Jun Zhu, Zaiqing Nie, Bo Zhang
The World Wide Web is a vast and rapidly growing repository of information. There are various kinds of objects, such as products, people... Sample PDF
Statistical Web Object Extraction
Chapter 284
Alexander Thomasian
Data storage requirements have consistently increased over time. According to the latest WinterCorp survey (http://www/, “The size of... Sample PDF
Storage Systems for Data Warehousing
Chapter 285
Subgraph Mining  (pages 1865-1870)
Ingrid Fischer
The amount of available data is increasing very fast. With this data, the desire for data mining is also growing. More and larger databases have to... Sample PDF
Subgraph Mining
Chapter 286
Jason Chen
Clustering analysis is a tool used widely in the Data Mining community and beyond (Everitt et al. 2001). In essence, the method allows us to... Sample PDF
Subsequence Time Series Clustering
Chapter 287
Mohammad Al Hasan
The research on mining interesting patterns from transactions or scientific datasets has matured over the last two decades. At present, numerous... Sample PDF
Summarization in Pattern Mining
Chapter 288
Ullas Nambiar
A query against incomplete or imprecise data in a database1, or a query whose search conditions are imprecise can both result in answers that do not... Sample PDF
Supporting Imprecision in Database Systems
Chapter 289
Barak Chizi, Lior Rokach, Oded Maimon
Dimensionality (i.e., the number of data set attributes or groups of attributes) constitutes a serious obstacle to the efficiency of most data... Sample PDF
A Survey of Feature Selection Techniques