Fast Data Processing for Large-Scale SOA and Event-Based Systems

Fast Data Processing for Large-Scale SOA and Event-Based Systems

Marcel Tilly (European Microsoft Innovation Center, Munich, Germany) and Stephan Reiff-Marganiec (University of Leicester, Leicester, UK)
DOI: 10.4018/IJSSOE.2015100103
OnDemand PDF Download:
List Price: $37.50


The deluge of intelligent objects that are providing continuous access to data and services on one hand and the demand of developers and consumers to handle these data on the other hand require us to think about new communication paradigms and middleware. In hyper-scale systems, such as in the Internet of Things, large scale sensor networks or even mobile networks, one emerging requirement is to process, procure, and provide information with almost zero latency. This work is introducing new concepts for a middleware to enable fast communication by limiting information flow with filtering concepts using policy obligations and combining data processing techniques adopted from complex event processing.
Article Preview


Today, there are various mega trends; people are talking about big data, cloud computing, service-oriented architecture (SOA), or the Internet of Things (IoT); just to name a few. All these trends have at least one common aspect: Data! There is a huge amount of data produced by a vast amount of heterogeneous sources, e.g. sensors, phones, cars, etc.. This data needs to be filtered, processed, and procured. Besides simply collecting all this data, there is rapidly growing demand to create timely insights into data. These insights can provide competitive avantages to business. Extracting relevant information from data or correlating data with other data sets as fast as possible is becoming a key factor for success. Latency, the time data needs to get processed, is getting more and more critical.

Some questions, which need to be answered, are:

  • How can this data get processed as fast as possible?

  • How can relevant data be separated from irrelevant data?

  • How can data get filtered efficiently and scalable?

  • How can data from distributed, heterogeneous data sources and services be integrated into a system?

  • How to combine different technologies, different interaction patterns to make data flow efficient?

This paper is providing answers to these questions. To achieve almost zero latency data processing, data must be available at the place where the user needs it, such as a data provider. So, instead of pulling data at request time from data sources, data should be pushed to such a data provider. This is only the first step towards a faster processing of data in terms of providing results with low-latency. If the data sources are continuously pushing data to a data provider (e.g. the selector) there is a vast amount of overhead by unnecessarily transferring data - a waste of bandwidth.

For mobile devices the cost of bandwidth needs to be taken into account. Let's assume the data provider, the one who interacts with the user, knows when the user needs updated data and the intelligent data sources know about their situation. Thus, the data provider informs the sources under which changing situation (when) the sources should inform the data provider about the change of their properties (what). What and when can be expressed with event policies, which are injected into the data sources, so that we can really make use of their intelligence.

Thus, each data source will be responsible to make the projection from its own fine-grained, raw data to some more high-level, complex data the data provider - and at the end the user - is interested in. The obligations can be as smart as possible by using various sets of information, such as the prioritization of the data. Consider for example an alarm situation with cascading alarms. Such a system has to ensure that the most severe alarms are delivered and the bandwidth is not occupied with unimportant information. Thus, event policies executed on smart data sources - intelligent objects - should enable low capacity filtering by being context-aware.

There are already approaches available, which cover parts of the problem. Research has provided approaches to handle and process data with low latency, such as complex event processing (CEP). There are also approaches to distribute processing, such as the actor model. Most of the approaches are tackling only one specific aspect of big data, cloud computing or SOA. No approach is really trying to find a holistic answer to solve new mega trends, such as the Internet of Things.

The approach described in this paper is trying to combine promising approaches to enable fast processing of data in hyper-scale and distributed setups. Hyper-scale means that there are millions of data sources as we can find in IoT setups. Data sources here can be considered as services offering data. This data can change over time, such as the temperature offered by a temperature sensor. There are other services offering weather information or traffic information for example.

Our solution of combining the classical request-response paradigm with event-based approaches and technologies to process data and enabling insights with low-latency is described in this paper. Some parts of this work have been previously published ((Tilly & Reiff-Marganiec, 2011),(Marganiec, Tilly, & Janicke, 2014)), however in this paper our ideas, our different contributions and a new detailed view on the architecture are being brought together for the first time.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 7: 4 Issues (2017): 3 Released, 1 Forthcoming
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing