Business Intelligence Architecture in Support of Data Quality

Business Intelligence Architecture in Support of Data Quality

Tom Breur (XLNT Consulting, The Netherlands)
Copyright: © 2014 |Pages: 19
DOI: 10.4018/978-1-4666-4892-0.ch019
OnDemand PDF Download:
No Current Special Offers


Business Intelligence (BI) projects that involve substantial data integration have often proven failure-prone and difficult to plan. Data quality issues trigger rework, which makes it difficult to accurately schedule deliverables. Two things can bring improvement. Firstly, one should deliver information products in the smallest possible chunks, but without adding prohibitive overhead for breaking up the work in tiny increments. This will increase the frequency and improve timeliness of feedback on suitability of information products and hence make planning and progress more predictable. Secondly, BI teams need to provide better stewardship when they facilitate discussions between departments whose data cannot easily be integrated. Many so-called data quality errors do not stem from inaccurate source data, but rather from incorrect interpretation of data. This is mostly caused by different interpretation of essentially the same underlying source system facts across departments with misaligned performance objectives. Such problems require prudent stakeholder management and informed negotiations to resolve such differences. In this chapter, the authors suggest an innovation to data warehouse architecture to help accomplish these objectives.
Chapter Preview


Companies launch business intelligence projects in pursuit of their broader information governance objectives. To leverage data assets, often some centralized business intelligence competence center (BICC) is appointed, with broad oversight of diverse groups of information consumers. For reasons of consistency and efficiency, often a centralized repository integrated data gets implemented, a data warehouse. To mitigate confusion and drive unified strategy, BICC’s will aspire a “single version of the truth” to ensure all departments sing of the same hymn sheet.

Data governance programs invariably (also) want to ensure high data quality. BI projects are sometimes fraught with data quality issues. This could typically be for one of three reasons:

  • The content of source system data that was supplied to the date warehouse was inadequate;

  • The source system data were accurate, but the data warehouse transformations turned out to be erroneous;

  • The content of source data was accurate, and data warehouse transformations were technically correct. However, the content of the report did not meet expectations, and further work in exploring requirements and specifying data interpretation should lead to an improved report.

The source of ‘errors’ in reporting can typically not immediately be known. This needs to be discovered through interaction with end-users of BI reports. This poses considerable risk to progress of data warehouse projects. To mitigate the risks associated with these various sources of data quality issues we propose choosing a hyper normalized data warehouse architecture that will better support resolving of data quality issues associated with such projects.

Business intelligence (BI) projects are risky. According to analyst firms like for instance Gartner, or the Standish Group reports, these projects have petrifying failure rates. In large part, these risks are caused by data quality problems that complicate data integration. When data from disparate business silos are confronted for the first time in the data warehouse, this often surfaces heretofore-unknown data quality issues.

BI teams merely ‘hold’ corporate data (as in: provide stewardship), they do not ‘own’ them. However, under organizational pressure to complete a project, they feel sometimes ‘forced’ into a role of pseudo ownership of data (and accompanying data quality errors) while they struggle to integrate fundamentally misaligned source data in the data warehouse.

This problem can either be mitigated, or exacerbated, depending on the architecture chosen for the data warehouse. An optimal architecture for data warehousing should allow interpretation of data to be left to their owners. This is not a responsibility of the BI team. They can facilitate discussions about its content, and shed light on source system characteristics, but the BI team should never be tasked to ‘own’ (as in: commit to long-term choices) interpretation of data quality problems. Interpretation of data, and decisions on how to deal with data quality errors, should be the realm of (senior) management in charge of the incumbent business silos that control source systems.

To (better) deal with quality issues that arise in the course of data integration, in this chapter I will suggest using an architecture that provides auditable and (easily) traceable data transformations. During project development, data modeling choices need to be changeable (at reasonable cost). Currently, the dominant data warehouse design paradigm (Kimball bus architecture) falls short on providing these features.

Data warehouse requirements, often carry considerable ambiguity. To cope with this ambiguity in requirements and resulting changes in the data integration engineering, I recommend a relatively new hyper normalized data warehouse architecture that enables both incremental development, as well as data model reengineering at limited cost.



In this chapter I will explain why hyper normalized hub-and-spoke architectures provide better support for data governance and data quality management, and are also better suited to more agile development of BI solutions. We know from experience that BI requirements are often ambiguous, and therefore amenable to change. That’s why you need a modeling paradigm that is resilient to change, and that safeguards against a loss of historical information when changes to the data model need to be made over the course of a complex data integration project.

Complete Chapter List

Search this Book: