Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

A Two-Tiered Segmentation Approach for Transaction Data Warehousing

Xiufeng Liu, Huan Huo, Nadeem Iftikhar, Per Sieverts Nielsen

Source Title: Emerging Perspectives in Big Data Warehousing

DOI: 10.4018/978-1-5225-5516-2.ch001

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Data warehousing populates data from different source systems into a central data warehouse (DW) through extraction, transformation, and loading (ETL). Massive transaction data are routinely recorded in a variety of applications such as retail commerce, bank systems, and website management. Transaction data record the timestamp and relevant reference data needed for a particular transaction record. It is a non-trivial task for a standard ETL to process transaction data with dependencies and high velocity. This chapter presents a two-tiered segmentation approach for transaction data warehousing. The approach uses a so-called two-staging ETL method to process detailed records from operational systems, followed by a dimensional data process to populate the data store with a star or snowflake schema. The proposed approach is an all-in-one solution capable of processing fast/slowly changing data and early/late-arriving data. This chapter evaluates the proposed method, and the results have validated the effectiveness of the proposed approach for processing transaction data.

Chapter Preview

Top

Introduction

A data warehouse is the decision-making database which holds the data extracted from transaction systems, operational data stores, or other external source systems. The process of processing the data from source systems into a central data warehouse is traditionally referred as data warehousing (Kimball & Caserta, 2004). The transformed data in a data warehouse (Inmon, 2002) are typically saved into the tables with a star schema and accessed by decision-support systems, such as Online Analytical Processing (OLAP) tools and Business Intelligence (BI) applications (March & Hevner, 2007). Data warehousing systems run ETL jobs at a regular time interval, such as daily, weekly or monthly. Operational data management systems create dynamic data through transactions. Transaction data are increasingly common across a variety of applications, such as telecommunications, bank systems, retail commerce, and website management. Transaction data consist of the records of individuals and events, and can be changed during business operations, e.g., add new orders, update or cancel existing orders. These changes are updated to the data warehouse to support decision-making purposes. The detailed transaction records in a data warehouse can trace the operations of an operational processing system, called System of Record (SOR) (Inmon, 2003). Usually, an ETL processes transaction records according to the arriving order of records, e.g., the timestamps, and the dependencies between records (if they exist). For example, when loading data to the data warehouse with a star schema, the dimension records are usually loaded first, then fact records, due to their foreign-key referencing relationship. If a fact record arrives first, the looking-up of a dimensional key will fail. Another example is to load data into snowflake schema tables where the foreign-key references also exist between normalized tables. The standard approach for loading snowflake schema tables is to load parent tables first (referenced tables), then to load child tables (referencing tables). As this approach loads data according to table dependency, there are some weaknesses: It requires extra space for storing early-arriving data; fact data loading cannot proceed until the dimension data have been loaded; and parallel loading becomes difficult, due to table dependency. Besides, another challenge in data warehousing is how to deal with loading fast-/slowly-changing data. For example, for loading of slowly changing dimension data (SCDs) (Kimball & Caserta, 2004), the traditional approach is first to check history records in the DW, then update the date attributes of the records, finally add a new record. All of the above are the challenges for a transaction data warehousing system.

This chapter proposes a two-tiered segmentation solution for a transaction data warehousing system. The proposed solution first uses a two-staging ETL to process detailed transaction records towards an SOR data warehouse (Tier-1 segmentation), then uses a second ETL process to populate the dimensional data store (DDS), which is called DDS process (Tier-2 segmentation). The two-staging ETL is responsible for populating the data from operational source systems into an SOR data warehouse, while the DDS process is responsible for populating the data from SOR into a multi-dimension data store. The two segmentations have a similar structure in which an additional data store is introduced for the ETL. The purpose of this design is to ease ETL optimizations, for example, implement parallelization and lower the complexity of data transformation. Moreover, this design is a one-stop solution to deal with early/late-arriving data, and fast/slowly-changing data (It will be discussed shortly). This solution is more efficient and less intrusive compared with the standard approach, which is, particularly, favorable for processing transaction data.

In summary, this chapter makes the following contributions: 1) The authors propose a novel 2-tier segmentation approach for a transaction data warehousing system; 2) The authors propose an all-in-one method for handling fast/slowly-changing data, and early/late-arriving data, which is easy for the maintenance and optimization of an ETL; 2) The authors propose a less-intrusive ETL method with a fast loading step, which can effectively reduce the downtime of a business intelligence system; 3) The authors propose an augmentation process for handling early/late-arriving data; and 4) the proposed approach can decouple ETL dependencies, which makes it possible to parallelize data loading.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

A Two-Tiered Segmentation Approach for Transaction Data Warehousing

Abstract

Introduction

Complete Chapter List