Parallel and Distributed Data Mining through Parallel Skeletons and Distributed Objects

Parallel and Distributed Data Mining through Parallel Skeletons and Distributed Objects

Massimo Coppola, Marco Vanneschi
Copyright: © 2003 |Pages: 36
ISBN13: 9781591400516|ISBN10: 1591400511|EISBN13: 9781591400950
DOI: 10.4018/978-1-59140-051-6.ch005
Cite Chapter Cite Chapter

MLA

Coppola, Massimo, and Marco Vanneschi. "Parallel and Distributed Data Mining through Parallel Skeletons and Distributed Objects." Data Mining: Opportunities and Challenges, edited by John Wang, IGI Global, 2003, pp. 106-141. https://doi.org/10.4018/978-1-59140-051-6.ch005

APA

Coppola, M. & Vanneschi, M. (2003). Parallel and Distributed Data Mining through Parallel Skeletons and Distributed Objects. In J. Wang (Ed.), Data Mining: Opportunities and Challenges (pp. 106-141). IGI Global. https://doi.org/10.4018/978-1-59140-051-6.ch005

Chicago

Coppola, Massimo, and Marco Vanneschi. "Parallel and Distributed Data Mining through Parallel Skeletons and Distributed Objects." In Data Mining: Opportunities and Challenges, edited by John Wang, 106-141. Hershey, PA: IGI Global, 2003. https://doi.org/10.4018/978-1-59140-051-6.ch005

Export Reference

Mendeley
Favorite

Abstract

We consider the application of parallel programming environments to develop portable and efficient high performance data mining (DM) tools. We first assess the need of parallel and distributed DM applications, by pointing out the problems of scalability of some mining techniques and the need to mine large, eventually geographically distributed databases. We discuss the main issues of exploiting parallel and distributed computation for DM algorithms. A high-level programming language enhances the software engineering aspects of parallel DM, and it simplifies the problems of integration with existing sequential and parallel data management systems, thus leading to programming-efficient and high-performance implementations of applications. We describe a programming environment we have implemented that is based on the parallel skeleton model, and we examine the addition of object-like interfaces toward external libraries and system software layers. This kind of abstractions will be included in the forthcoming programming environment ASSIST. In the main part of the chapter, as a proof-of-concept we describe three well-known DM algorithms, Apriori, C4.5, and DBSCAN. For each problem, we explain the sequential algorithm and a structured parallel version, which is discussed and compared to parallel solutions found in the literature. We also discuss the potential gain in performance and expressiveness from the addition of external objects on the basis of the experiments we performed so far. We evaluate the approach with respect to performance results, design, and implementation considerations.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.