Data Mining for High Performance Computing

Data Mining for High Performance Computing

Shen Lu
Copyright: © 2015 |Pages: 19
DOI: 10.4018/978-1-4666-7461-5.ch014
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

With the development of information technology, the size of the dataset becomes larger and larger. Distributed data processing can be used to solve the problem of data analysis on large datasets. It partitions the dataset into a large number of subsets and uses different processors to store, manage, broadcast, and synchronize the data analysis. However, distributed computing gives rise to new problems such as the impracticality of global communication, global synchronization, dynamic topology changes of the network, on-the-fly data updates, the needs to share resources with other applications, frequent failures, and recovery of resource. In this chapter, the concepts of distributed computing are introduced, the latest research are presented, the advantage and disadvantage of different technologies and systems are analyzed, and the future trends of the distributed computing are summarized.
Chapter Preview
Top

Data Mining

Data Mining as the Evolution of Information Technology

Data mining can be used to integrate, manage, analyze and predict information. With the development of World Wide Web, data storage devices, and data collecting machines, a vast amount of information are collected each day from business, science, medicine and almost every aspect of daily life. The fast-growing, tremendous amount of data, collected and stored in large and numerous data repositories, has far exceeded our human ability for comprehension without powerful tools. We need to understand data, use data to help make decisions, find interesting knowledge from data and so on. Data mining is the process of discovering interesting patterns and knowledge from large amounts of data. However, the process of knowledge discovery includes several steps, such as data cleaning, data integration, data selection, data transformation, data mining, pattern evaluation, and knowledge presentation.

Complete Chapter List

Search this Book:
Reset