Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Index Structures for Data Warehousing and Big Data Analytics

Veit Köppen, Martin Schäler, David Broneske

Source Title: Emerging Perspectives in Big Data Warehousing

DOI: 10.4018/978-1-5225-5516-2.ch008

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

With the ongoing increasing amount of data, these data have to be processed to gain new insights. Data mining techniques and user-driven OLAP are used to identify patterns or rules. Typical OLAP queries require database operations such as selections on ranges or projections. Similarly, data mining techniques require efficient support of these operations. One particularly challenging, yet important property, that an efficient data access has to support is multi-dimensionality. New techniques have been developed taking advantage of novel hardware environments including SIMD or main-memory usage. This includes sequential data access methods such SIMD, BitWeaving, or Column Imprints. New data structures have been also developed, including Sorted Projections or Elf, to address the features of modern hardware and multi-dimensional data access. In the context of multidimensional data access, the influence of modern hardware, including main-memory data access and SIMD instructions lead to new data access techniques. This chapter gives an overview on existing techniques and open potentials.

Chapter Preview

Top

Scope Of This Article

With the ongoing increasing amount of data, these data have to be processed to gain new insights. Therefore, a solution is necessary to store and query multidimensional data efficiently. In relational databases, index structures like the B-tree (Bayer et al, 1972) have improved the data access drastically. In a multidimensional domain, which is common for data warehouses as well as big data applications, these index structures have either a limitation to support a specialized scenario or cannot scale sufficiently. Additionally, the authors in (Berchtold et al, 1998) address the problem, called curse of dimensionality, which is an important insight for multidimensional data access. As a result many approaches and most present-day database systems rely on optimized sequential scans taking advantage of the capability of modern hardware. Moreover, this leads to the optimization of sequential scans for multidimensional data to support OLAP (Online Analytical Processing) analyses.

Predicate evaluation is a challenging task in the OLAP domain (Johnson et al, 2008) required amongst others for slice operations on the data cube. More generally, to extract important data for further analysis, fact and dimension tables are passed through several selection predicates involving several dimension attributes. Shrinking the processed data amount as soon as possible has become an important task, when all data have to fit into main memory. So, the I/O bottleneck is eliminated and a full table scan becomes less expensive. Therefore, we focus on how all approaches support this operation.

In case all data sets are available in main memory (e.g., in a main-memory database system (Boncz et al, 2008; Kemper et al, 2001; Plattner, 2009), the selectivity threshold for using an index structure instead of an optimized full table scan is even smaller than for disk-based database systems. In a recent study (Dasa et al, 2015), the authors propose to use an index structure for very low selectivities only, such as values below 2%. Hence, most OLAP queries would never use an index structure to evaluate the selection predicates. However, an interesting fact neglected by this approach is that the accumulated selectivity of several selection predicates is favored if this exploits the relation between all selection predicates of the query. Consequently, when considering multi-dimensional queries, we achieve the selectivity required to use an index structure instead of an accelerated scan.

Another fact for indexes that completely fit into the main memory is, that the structures should consider two aspects carefully: an optimization according to the restrictions of CPU caches and the opportunity of multi-core parallelism (Faeber et al, 2016). New main memory database systems, such as C-Store, HyPer, and SAP HANA, do not use a page-based indirection, but use the provided storage efficiently. Additionally, a pointer directly addresses the record and therefore, identifiers are omitted. Consequently, we illustrate how all approaches make efficient use of present-day hardware.

Altogether, there is a wide range of different approaches, each having their own advantages and limitations. In this paper, we aim at giving an overview on how such approaches work and how they are related to each other. This paper particularly addresses readers that start getting into contact with the domain of efficient data access methods on modern hardware. To this end, we introduce different approaches in the next section using one example data set to illustrate differences between the approaches. After that, we show how an exemplary evaluation of different approaches looks like and we also give a first indication of strengths and weaknesses of either approach.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Index Structures for Data Warehousing and Big Data Analytics

Abstract

Scope Of This Article

Complete Chapter List