Empowering the OLAP Technology to Support Complex Dimension Hierarchies

Svetlana Mansmann

doi:10.4018/978-1-60566-098-1.ch022

Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Empowering the OLAP Technology to Support Complex Dimension Hierarchies

Svetlana Mansmann

Source Title: Selected Readings on Database Technologies and Applications

DOI: 10.4018/978-1-60566-098-1.ch022

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Comprehensive data analysis has become indispensable in a variety of domains. OLAP (On-Line Analytical Processing) systems tend to perform poorly or even fail when applied to complex data scenarios. The restriction of the underlying multidimensional data model to admit only homogeneous and balanced dimension hierarchies is too rigid for many real-world applications and, therefore, has to be overcome in order to provide adequate OLAP support. We present a framework for classifying and modeling complex multidimensional data, with the major effort at the conceptual level as to transform irregular hierarchies to make them navigable in a uniform manner. The properties of various hierarchy types are formalized and a two-phase normalization approach is proposed: heterogeneous dimensions are reshaped into a set of well-behaved homogeneous subdimensions, followed by the enforcement of summarizability in each dimension’s data hierarchy. Mapping the data to a visual data browser relies solely on metadata, which captures the properties of facts, dimensions, and relationships within the dimensions. The navigation is schema-based, that is, users interact with dimensional levels with ondemand data display. The power of our approach is exemplified using a real-world study from the domain of academic administration.

Chapter Preview

Top

Introduction

Data Warehouses (DWs) are repositories that typically store large amounts of data that have been extracted and integrated from transactional systems and various other operational sources. Those repositories are useful for online analytical processing (OLAP) and data mining analysis. Typical queries include both standard reporting and ad hoc analysis. They usually are complex and access very large volumes of data, performing time-consuming aggregations. Although data warehouses easily can reach many Giga or Terabytes, users still require fast answers to their analyses. Therefore, performance becomes a major concern in those systems. Although structures such as materialized views and specialized indexes improve response times for predicted queries, parallel processing can be used alone or in conjunction with those structures to offer a major performance boost and to guarantee speedup and scale-up, even for unpredicted ad hoc queries.

Parallel database systems are implemented using one of the following parallel architectures: shared-memory, shared-disk, shared nothing, hierarchical, NUMA (Valduriez & Ozsu, 1999). Each choice has implications for parallel query processing algorithms and data placement. In practice, parallel environments involve several extra overheads related to data and control exchanges between processing units and also concerning storage, so that all components of the system need to be designed to avoid bottlenecks that would compromise the whole processing efficiency. Some parts of the system even may have to account for the aggregate flow into/from all units. For instance, in shared-disk systems, the storage system, including controllers and connections to storage, have to be sufficiently fast in order to handle the aggregate of all accesses without becoming a significant bottleneck for I/O-bound applications. To handle potential bottlenecks, specialized, fast, and fully dedicated parallel hardware and interconnects are required. An attractive alternative is to use a number of low-cost computer nodes in a shared-nothing environment, possibly in a non-dedicated local network, and design the system with special partitioning and processing care. In such an environment, each node has a basic database engine, and the system includes a middle layer providing parallelism to the whole environment. The Node Partitioned Data Warehouse (NPDW) is a generic architecture for partitioning and processing query-intensive data in such an environment. One of the objectives of the Node Partitioned Data Warehouse is to minimize the dependency on very fast, dedicated computing and data exchange infrastructures by optimizing partitioning and making use of replication whenever useful.

DeWitt and Gray (1992) review the major issues in parallel database systems implemented over conventional shared-nothing architectures. One of the major concerns when using such an architecture is to decide how to partition or to cluster relations into nodes, which raises the issue of how to determine the most appropriate partitioning and placement choice for a schema. Data warehouses are a specialized type of database with specific characteristics and requirements that may be useful in the partitioning and placement decision. They are mostly read-only, periodically loaded centralized repositories of data. Replication-related consistency issues are minor when compared to full-blown transactional systems.

The star schema (Kimball, 1996) is part of the typical data organization in a data warehouse, representing a multidimensional logic with a large central fact table and smaller dimension tables. Facts typically are very large relations with hundreds of gigabytes of historical details. Dimensions are smaller relations identifying entities by means of several descriptive properties.

In that context, a basic placement strategy for the simple star schema replicates dimensions and fully partitions the large central fact horizontally randomly. Figure 1 illustrates the simple placement strategy. The large fact F is partitioned into node fragments F_i, and dimensions D are replicated into all nodes. Very small dimensions even can be cached in memory for faster access and join processing.

Figure 1.

Partitioning the star schema

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Empowering the OLAP Technology to Support Complex Dimension Hierarchies

Abstract

Introduction

Complete Chapter List