MidSemI: A Middleware for Semantic Integration of Business Data with Large-scale Social and Linked Data

MidSemI: A Middleware for Semantic Integration of Business Data with Large-scale Social and Linked Data

Samir Sellami (LIRE Laboratory, University of Constantine 2 - Abdelhamid Mehri, Constantine, Algeria), Taoufiq Dkaki (IRIT Laboratory, University of Toulouse 2 - Jean Jaurès, Toulouse, France), Nacer Eddine Zarour (LIRE Laboratory, University of Constantine 2 - Abdelhamid Mehri, Constantine, Algeria) and Pierre-Jean Charrel (IRIT Laboratory, University of Toulouse 2 - Jean Jaurès, Toulouse, France)
Copyright: © 2019 |Pages: 25
DOI: 10.4018/IJISMD.2019040101


The web diversification into the Web of Data and social media means that companies need to gather all the necessary data to help make the best-informed market decisions. However, data providers on the web publish data in various data models and may equip it with different search capabilities, thus requiring data integration techniques to access them. This work explores the current challenges in this area, discusses the limitations of some existing integration tools, and addresses them by proposing a semantic mediator-based approach to virtually integrate enterprise data with large-scale social and linked data. The implementation of the proposed approach is a configurable middleware application and a user-friendly keyword search interface that retrieves its input from internal enterprise data combined with various SPARQL endpoints and Web APIs. An evaluation study was conducted to compare its features with recent integration approaches. The results illustrate the added value and usability of the contributed approach.
Article Preview


Data is quickly becoming the critical business asset in today's digital world. Enterprises need to access the valuable data on the web to help make the best-informed market decisions. The strong support that web-based technologies have received from developers, researchers, and practitioners have entirely changed the procedure of sharing knowledge over the Web (Heath & Bizer, 2011). As a result, many data sources from almost any domain are now accessible through the internet. In many cases, we can use these complementary data sources, in other words, the data scattered from these sources can be connected to find the targeted result. Thus, the Linked Open Data (LOD) initiative suggest a set of principles to publish structured data in RDF that employs open standards and can be read by machines. The application of the LOD principles has led to the birth of the Web of Data (Schmachtenberg, Bizer, Jentzsch, & Cyganiak, 2017). Despite the massive adoption of RDF by leading media and public sector organizations which aims at being the universal abstract data model to publish their structured data, a vast majority of web information services expose and consume non-RDF data. Examples of these services are social media websites, which continue supplying their data via Web APIs in semi-structured format, notably Extensible Markup Language (XML) or JavaScript Object Notation (JSON). The challenge for organizations is how to incorporate this rich user-generated information together with the data contained in corporate databases. This information aggregation may unveil significant business opportunities to those who can understand the value of the new merged data.

The access, retrieval, and utilization of the information from these different segments of the Web along with enterprise internal business relational databases (RDB) call for the data to be fully integrated and for the users to be provided with single (and simple) entry points to access them. However, data providers on the Web publish data in various data models, and they may equip it with different search capabilities, e.g., SPARQL endpoints or REST API services; thus, requiring data integration techniques that provide users with a unified view to access them. Figure 1 (a) illustrates that the current process of gathering such various items or information is extremely onerous for users and time-consuming because it requires access to a high number of diverse data sources then try to connect individual search results manually. Our vision of an automatic integration platform, which semantically combines information from distributed sources, is illustrated in Figure 1 (b). Here the user needs only to interact with the integration platform to query and extract data about entities from a variety of information sources. Regarding data sources, our goal is to retrieve and integrate relevant information from the Social Web, the Web of data as well as internal databases and enterprise spreadsheets.

Figure 1.

Our vision of semantic integration platform. (a) Traditional manual integration process in different open web data, plus internal data. (b) Integration platform process, which semantically aggregates information from, distributed sources.

  • The Social Web comprises user-generated content and profiles. It includes social networks such as Facebook and LinkedIn. These social networks provide an API interface to query the data.

  • The Web of Data is a further valuable information source, where we can find data sources comprising billions of machine comprehensible facts. This data space offers relevant background knowledge (e.g., spatial context information) for information aggregating. Examples in this category are the DBpedia1 and GeoNames2 datasets.

  • Internal Data. Finally, the primary objective of the platform is to integrate all this open web data with internal enterprise data source of the CRM system and excel spreadsheets. These data sources are usually structured relational databases (RDB).

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 11: 4 Issues (2020): 1 Released, 3 Forthcoming
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing