The Data Swarm: A Next Step for Distributed Data Analytics

The Data Swarm: A Next Step for Distributed Data Analytics

Jeffrey Smith (University of St. Thomas, St. Paul, MN, USA) and Manjeet Rege (University of St. Thomas, St. Paul, MN, USA)
Copyright: © 2016 |Pages: 13
DOI: 10.4018/IJIRR.2016010104


The traditional data warehouse model is no longer able to keep up with the evolution and changing requirements of the data analytic world. As we see the concept of a logical data warehouse gain momentum, there's a resulting need to drive a portion of the analytics closer to where the data is actually created and used. This paper uses the concept of swarm intelligence as a basis for simple, distributed analytics architecture to help address this need. It illustrates this with an example based on a chain of retail stores and demonstrates how this model could simplify the architecture and, at the same time, and increase data availability while decreasing cost.
Article Preview


First, we must understand the evolving nature of the current data analytic environment and what’s driving the need for new approaches to analytics, as well as innovative applications of those approaches. The need to find ways to decentralize the analytic discipline wherever and whenever possible continues to grow.

The modern enterprise is inundated with data, as data volumes across the enterprise grow by 35%-50% every year. Gartner analysis shows that Corporate America will process more than 60 terabytes of data each year, which is around a thousand times more than a decade ago (Beath C., Becerra-Fernandez I., Ross J., & Short J., 2012). Because of this dramatic increase in data, substantial investments are being made in technologies that service this data such as integration of data, business intelligence tools, and data warehousing infrastructure, as well as in the people needed to develop and support their implementation.

Business intelligence and analytics are at the heart of the user experience in data warehousing, and Gartner estimates spending on it was around $14.1 Billion in 2013 and that it will continue to grow at roughly 7% annually over the next few years (Sallam R. L., Tapadinhas J., Parenteau J., Yuen D., & Hostmann B., 2014). Gartner also estimates that the growing field of advanced analytics is currently around a $2 Billion market and that it will continue to increase rapidly as the tools become more user-friendly and the data more accessible (Herschel G., Linden A., & Kart L., 2014).

These investments have driven large-scale business improvements by supporting the decision making process and streamlining data architectures. However significant innovations in technology – such as data discovery and interactive visualizations, predictive analytics, in-memory computing, and big data – combined with a much greater variety of users and use cases, are increasing the need for a wholesale transformation of performance, people, process and platforms. The number of use cases that business analytics must support will continue to multiply, requiring multiple tools and approaches (Chandler, N., 2015).

Gartner estimates that well over 80% of the market is continuing to follow traditional data warehouse approaches as seen in Figure 1 (Edjlali R., & Beyer M. A., 2014). However, with the fast paced technological transformation and introduction of unstructured and nontraditional data, as well as new types of advanced analytics, it is also predicting that traditional data warehouse practices will be outdated by the end of 2018 (Beyer M. A., & Edjlali R., 2014).

Figure 1.

A traditional data warehouse following the combined Inmon/Kimball models (adapted from Inmon W.H., 2015)

The death of the traditional data warehouse is imminent because, not only will data warehouses be expected to continue supporting reporting and traditional style business intelligence, they will also have to support integrated information for nearly incompatible service-level expectations in different analytic use cases, data provisioning for analytics embedded in operational applications, and hybrid transaction and analytical processing.

For the past five years Gartner has been pushing a concept, which they refer to as the Logical Data Warehouse (LDW), to address this analytic evolution. The LDW is focused on the concept that a data warehouse is principally an integrated data management platform that facilitates data asset consolidation and supports time variance in using the data – and is not specifically a physically centralized repository. Over the past five years the concept of using different technologies for these very different SLAs for data management for analytics has gain rapid acceptance in the industry and for many has even become preferable (Beyer M. A., 2014).

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 9: 4 Issues (2019): 1 Released, 3 Forthcoming
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing