Principled Reference Data Management for Big Data and Business Intelligence

Principled Reference Data Management for Big Data and Business Intelligence

Sushain Pandit, Ivan Milman, Martin Oberhofer, Yinle Zhou
DOI: 10.4018/IJOCI.2017010104
(Individual Articles)
No Current Special Offers


Most large enterprises requiring operational business processes utilize several thousand instances of legacy, upgraded, cloud-based, and/or acquired information management applications. With the advent of Big Data, Business Intelligence (BI) systems, receive unconsolidated data from a wide-range of data sources with no overarching governance procedures to ensure quality and consistency. Although different applications deal with their own flavor of data, reference data is found in all of them. Given the critical role that BI plays in ensuring business success, the fact that BI relies heavily on the quality of data to ensure that the intelligence being provided is trustworthy, and the prevalence of reference data in the information integration landscape, a principled approach towards management, stewardship and governance of reference data becomes necessary to ensure quality and operational excellence across BI systems. The authors discuss this approach in context of typical reference data management concepts and features, leading to a comprehensive solution architecture for BI integration.
Article Preview

Background And Introduction

An organization’s approach to reference data management (RDM) can have a profound influence on the effectiveness of organizational business intelligence processes (Chisholm, 2000). Reference data are those data values that support consistent representation (coding) of key information within an organization or across groups of organizations. Reference data are usually sets of permissible values for an attribute (Xu, et al., 2012) or classification schemas that are referred to by system applications, data stores, processes, and reports, as well as by transactional and master records (McGilvray, 2008). These may be standard codes defined external to the organization such as currency codes and country codes (such as those defined by ISO – the International Organization for Standards), domain specific standards subscribed to by the organization such as the Logical Observation Identifiers Names and Codes (LOINC 1999, 2013) standard for coding medical laboratory observations, or internal standards such as internally developed product codes. Such reference data sets, comprising a range of permissible values for entity attributes, usually reside within specialized tables known as look-up, code, check, or domain tables (Xu, et al., 2012).

Although there are many classification schemes that address the various types of data, almost all recognize the categories of master data, metadata, reference data, and transactional data (Dreibelbis, et al., 2008) (McGilvray, 2008). Of these, reference data is most related to master data and metadata, but reference data is distinguished from them. Master data represents the common business objects (such as people, places, and things) that need to be agreed on and shared throughout an enterprise (Dreibelbis, et al., 2008). Reference data, while used to consistently author master data instances by governing the permissible set of values for a given master data attribute, is not the same as master data. Reference data is also different from metadata which literally means “data about data”. Metadata labels, describes, and characterizes other data and make it easier to retrieve, interpret, or use information (McGilvray, 2008).

Figure 1 illustrates the concepts of master data, metadata and reference data by using an example. Assume international retailer ABC Inc. who sells luxury products around the world and whose customers tend to travel and shop across the countries. A global customer information integration project is conducted to improve customer information and relationship management. The master data table in Figure 1 has the customer records which are captured when customers are shopping around the world. For example, customer Carrie Lee is a Chinese citizen who lives in Beijing. She shops for ABC’s products in Beijing, but due to her job, she often travels to United States and South Korea, and purchases ABC’s products there, too. In different regions, ABC uses the local language for product information and customer information. Therefore, Carrie Lee’s information in the integrated system is in three languages: Chinese, English, and Korean. To properly represent this information as master data entities, a well-defined set of reference data tables is required. The reference data table shown in Figure 1 gives the range of permissible values for the attribute language and follows the ISO 639-1 standard (ISO 639, 2013). The metadata table describes the data usage and data structure specifications for each attribute in the master data table and gives the requirement for the reference data table as well. For example, the definition of language in the metadata table links back to the language reference data table, which follows the ISO 639-1 standard.

Figure 1.

An example of master data, metadata and reference data


Complete Article List

Search this Journal:
Volume 13: 1 Issue (2023)
Volume 12: 4 Issues (2022)
Volume 11: 4 Issues (2021)
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing