Experience Report: A Component-Based Data Management and Knowledge Discovery Framework for Aviation Studies

Experience Report: A Component-Based Data Management and Knowledge Discovery Framework for Aviation Studies

M. Brian Blake (Georgetown University and The MITRE Corporation, USA), Lisa Singh (Georgetown University, USA), Andrew B. Williams (Spelman University, USA), Wendell Norman (The MITRE Corporation, USA) and Amy L. Sliva (Georgetown University, USA)
DOI: 10.4018/978-1-60566-418-7.ch016
OnDemand PDF Download:
$37.50

Abstract

Organizations are beginning to apply data mining and knowledge discovery techniques to their corporate data sets, thereby enabling the identification of trends and the discovery of inductive knowledge. Many times, traditional transactional databases are not optimized for analytical processing and must be transformed. This article proposes the use of modular components to decrease the overall amount of human processing and intervention necessary for the transformation process. Our approach con- figures components to extract data-sets using a set of “extraction hints”. Our framework incorporates decentralized, generic components that are reusable across domains and databases. Finally, we detail an implementation of our component-based framework for an aviation data set.
Chapter Preview
Top

Introduction

Over the past decade, government and industry organizations have enhanced their operations by utilizing emerging technologies in data management. Advances in database methodology and software (i.e. warehousing of transactional data) has increased the ability of organizations to extract useful knowledge from operational data and has helped build the foundation for the field of knowledge discovery in databases (KDD) (Fayyad, 1996; Sarawagi, 2000; Software Suites supporting Knowledge Discovery, 2005). KDD consists of such phases as selection, pre-processing, transformation, data mining, and interpretation/evaluation. Selection involves identifying the data that should be used for the data mining process. Typically, the data is obtained from multiple heterogeneous data sources. The pre-processing phase includes steps for data cleansing and the development of strategies for handling missing data and various data anomalies. Data transformation involves converting data from the different sources into a single common format. This step also includes using data reduction techniques to reduce the complexity of the selected data, thereby simplifying future steps in the KDD process. Data mining tasks apply various algorithms to the transformed data to generate and identify “hidden knowledge”. Finally, the area of interpretation/evaluation focuses on creating an accurate and clear presentation of the data mining results to the user.

Excluding the data mining phase, where there are a plethora of automated algorithms and applications, the other phases are mostly human-driven. Data experts are required to complete the tasks related to the majority of steps in the KDD process as explained below.

  • Data Formatting, Loading, Cleaning and Anomaly Detection. In the pre-processing phase, data experts must correct and update incorrect data values, populate missing data values, and fix data anomalies.

  • Adding Important Meta-Data to the Database. In the data transformation phase, data must be integrated into a single model that supports analytical processing. This typically involves adding meta-data and converting data sets from text files and traditional relational schemas to star or multidimensional schemas.

  • User and Tool-Generated Hints. In the final phases (i.e. data mining and evaluation), general approaches are needed to assist users in preparing knowledge discovery routines and analyzing results. These general approaches must allow the user to manually specify potential correlation areas or “hints”. In the future, the suggestion of new hints may be automated by intelligent software mechanisms.

These human-driven tasks pose problems since the initial data set, which we will refer to as the raw data, is large, complex and heterogeneous. Our work attempts to reduce the amount of time required for human-driven tasks in the KDD setting. General reusable components may represent a feasible solution to assist in the execution of the time-consuming processing tasks underlying KDD. In this paper, specific tasks suitable for such components are identified and characterized. In addition, a component-based framework and corresponding process are described to address these tasks.

The paper proceeds in the following section with a discussion of related work with respect to component-based KDD. The paper then introduces the Component-Based Knowledge Discovery in Databases (C-KDD) framework. Subsequent sections provide specific low-level technical details of the C-KDD framework and, in the final sections, the C-KDD is used in an aviation-based study.

Complete Chapter List

Search this Book:
Reset
Editorial Advisory Board
Table of Contents
Chapter 1
Olivier Berger, Christian Bac, Benoît Hamet
Libre software provides powerful applications ready to be integrated for the build-up of platforms for internal use in organizations. We describe... Sample PDF
Integration of Libre Software Applications to Create a Collaborative Work Platform for Researchers at GET
$37.50
Chapter 2
James Howison, Megan Conklin, Kevin Crowston
This paper introduces and expands on previous work on a collaborative project, called FLOSSmole (formerly OSSmole), designed to gather, share and... Sample PDF
FLOSSmole: A Collaborative Repository for FLOSS Research Data and Analyses
$37.50
Chapter 3
Luis López-Fernández, Gregorio Robles, Jesus M. Gonzalez-Barahona, Israel Herraiz
Source code management repositories of large, long-lived libre (free, open source) software projects can be a source of valuable data about the... Sample PDF
Applying Social Network Analysis Techniques to Community-Driven Libre Software Projects
$37.50
Chapter 4
Walt Scacchi, Chris Jensen, John Noll, Margaret Elliott
Understanding the context, structure, activities, and content of software development processes found in practice has been and remains a challenging... Sample PDF
Multi-Modal Modeling, Analysis, and Validation of Open Source Software Development Processes
$37.50
Chapter 5
B. B. Rossi, M. Scotto, A. Sillitti, G. Succi
The aim of the paper is to report the results of a migration to Open Source Software (OSS) in one Public Administration. The migration focuses on... Sample PDF
An Empirical Study on the Migration to OpenOffice.org in a Public Administration
$37.50
Chapter 6
Claudio Agostino Ardagna, Fulvio Frati, Gabriele Gianini
Business and recreational activities on the global communication infrastructure are increasingly based on the use of remote resources and services... Sample PDF
Open Source in Web-Based Applications: A Case Study on Single Sign-On
$37.50
Chapter 7
Qusay H. Mahmoud, Zakaria Maamar
Conventional desktop software applications are usually designed, built, and tested on a platform similar to the one on which they will be deployed... Sample PDF
Engineering Wireless Mobile Applications
$37.50
Chapter 8
G. Sivaradje, R. Nakkeeran, P. Dananjayan
In this paper, a novel prediction technique is proposed, which uses road topology information for prediction. The proposed scheme uses real time... Sample PDF
A Prediction Based Flexible Channel Assignment in Wireless Networks using Road Topology Information
$37.50
Chapter 9
Hesham A. Ali, Tamer Ahmed Farrag
Due to the rapidly increasing of the mobile devices connected to the internet, a lot of researches are being conducted to maximize the benefit of... Sample PDF
High Performance Scheduling Mechanism for Mobile Computing Based on Self-Ranking Algorithm (SRA)
$37.50
Chapter 10
Khaldoon Al-Zoubi
This paper proposes hierarchal scheduling schemes for Grid systems: a self-discovery scheme for the resource discovery stage and an adaptive child... Sample PDF
Hierarchical Scheduling in Heterogeneous Grid Systems
$37.50
Chapter 11
Amjad Mahmood, Taher S.K. Homeed
Object replication is a well-known technique to improve performance of a distributed Web server system. This paper first presents an algorithm to... Sample PDF
Object Grouping and Replication on a Distributed Web Server System
$37.50
Chapter 12
Saher S. Manaseer, Mohamed Ould-Khaoua, Lewis M. Mackenzie
In wireless communication environments, backoff is traditionally based on the IEEE binary exponential backoff (BEB). Using BEB results in a high... Sample PDF
On the Logarithmic Backoff Algorithm for MAC Protocol in MANETs
$37.50
Chapter 13
Xunhua Wang, David Rine
Domain Name System (DNS) is the system for the mapping between easily memorizable host names and their IP addresses. Due to its criticality, the... Sample PDF
Secure Online DNS Dynamic Updates: Architecture and Implementation
$37.50
Chapter 14
Osama H.S. Khader
In mobile ad hoc networks, routing protocols are becoming more complicated and problematic. Routing in mobile ad hoc networks is multi-hop because... Sample PDF
FSR Evaluation Using the Suboptimal Operational Values
$37.50
Chapter 15
Suet Chun Lee
Software product line (SPL) is a software engineering paradigm for software development. A software product within a product line often has specific... Sample PDF
Modeling Variant User Interfaces for Web-Based Software Product Lines
$37.50
Chapter 16
M. Brian Blake, Lisa Singh, Andrew B. Williams, Wendell Norman, Amy L. Sliva
Organizations are beginning to apply data mining and knowledge discovery techniques to their corporate data sets, thereby enabling the... Sample PDF
Experience Report: A Component-Based Data Management and Knowledge Discovery Framework for Aviation Studies
$37.50
Chapter 17
A. F. Tappenden, T. Huynh, J. Miller, A. Geras, M. Smith
This article outlines a four-point strategy for the development of secure Web-based applications within an agile development framework and... Sample PDF
Agile Development of Secure Web-Based Applications
$37.50
Chapter 18
D. Xuan Le, J. Wenny Rahayu, David Taniar
This paper proposes a data warehouse integration technique that combines data and documents from different underlying documents and database design... Sample PDF
Web Data Warehousing Convergence: From Schematic to Systematic
$37.50
Chapter 19
Haya El-Ghalayini, Mohammed Odeh, Richard McClatchey
This paper studies the differences and similarities between domain ontologies and conceptual data models and the role that ontologies can play in... Sample PDF
Engineering Conceptual Data Models from Domain Ontologies: A Critical Evaluation
$37.50
Chapter 20
John D. Ferguson, James Miller
It is now widely accepted that software projects utilizing the Web (e-projects) face many of the same problems and risks experienced with more... Sample PDF
Modeling Defects in E-Projects
$37.50
Chapter 21
Jaime Gomez, Alejandro Bia, Antonio Parraga
This paper describes the engineering foundations of VisualWADE, a CASE tool to automate the production of Web applications. VisualWADE follows a... Sample PDF
Tool Support for Model-Driven Development of Web Applications
$37.50
About the Editors