Node Partitioned Data Warehouses: Experimental Evidence and Improvements

Node Partitioned Data Warehouses: Experimental Evidence and Improvements

Pedro Furtado
ISBN13: 9781599049519|ISBN10: 1599049511|EISBN13: 9781599049526
DOI: 10.4018/978-1-59904-951-9.ch046
Cite Chapter Cite Chapter

MLA

Furtado, Pedro. "Node Partitioned Data Warehouses: Experimental Evidence and Improvements." Data Warehousing and Mining: Concepts, Methodologies, Tools, and Applications, edited by John Wang, IGI Global, 2008, pp. 718-737. https://doi.org/10.4018/978-1-59904-951-9.ch046

APA

Furtado, P. (2008). Node Partitioned Data Warehouses: Experimental Evidence and Improvements. In J. Wang (Ed.), Data Warehousing and Mining: Concepts, Methodologies, Tools, and Applications (pp. 718-737). IGI Global. https://doi.org/10.4018/978-1-59904-951-9.ch046

Chicago

Furtado, Pedro. "Node Partitioned Data Warehouses: Experimental Evidence and Improvements." In Data Warehousing and Mining: Concepts, Methodologies, Tools, and Applications, edited by John Wang, 718-737. Hershey, PA: IGI Global, 2008. https://doi.org/10.4018/978-1-59904-951-9.ch046

Export Reference

Mendeley
Favorite

Abstract

Data Warehouses (DWs) with large quantities of data present major performance and scalability challenges, and parallelism can be used for major performance improvement in such context. However, instead of costly specialized parallel hardware and interconnections, we focus on low-cost standard computing nodes, possibly in a non-dedicated local network. In this environment, special care must be taken with partitioning and processing. We use experimental evidence to analyze the shortcomings of a basic horizontal partitioning strategy designed for that environment, then propose and test improvements to allow efficient placement for the low-cost Node Partitioned Data Warehouse. We show experimentally that extra overheads related to processing large replicated relations and repartitioning requirements between nodes can significantly degrade speedup performance for many query patterns. We analyze a simple, easy-to-apply partitioning and placement decision that achieves good performance improvement results. Our experiments and discussion provide important insight into partitioning and processing issues for data warehouses in shared-nothing environments.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.