Privacy-Conscious Data Mashup: Concepts, Challenges and Directions

Privacy-Conscious Data Mashup: Concepts, Challenges and Directions

Mahmoud Barhamgi (Claude Bernard Lyon 1 University, France), Chirine Ghedira (Claude Bernard Lyon 1 University, France), Salah-Eddine Tbahriti (Claude Bernard Lyon 1 University, France), Michael Mrissa (Claude Bernard Lyon 1 University, France), Djamal Benslimane (Claude Bernard Lyon 1 University, France) and Brahim Medjahed (University of Michigan-Dearborn, USA)
DOI: 10.4018/978-1-4666-0146-8.ch016
OnDemand PDF Download:
List Price: $37.50


Modern enterprises across all spectra are increasingly adopting SOA-based data integration architectures to rapidly respond to transient data business needs. In this chapter, the authors analyze a new class of enterprise data integration application, called Data Mashup, in which data services are composed on the fly to answer new data business demands. The chapter reviews the different approaches to data mashup, discusses their limitations, and identifies the main requirements to data mashup. The authors next propose a declarative data mashup approach addressing the identified requirements. Finally, the chapter presents some research directions that must be followed in order for data mashup technology to mature.
Chapter Preview

1. Introduction

Since the dawn of the information age, data has been at the center of enterprise applications. The introduction of relational database management systems in the 1970’s created a productive new world that enabled developers of data-centric enterprise applications (i.e., data integration applications) to work much more efficiently than ever before. Data-centric application development in the pre-relational era meant hand-writing, tuning, and maintaining large procedural programs to access and manipulate the application’s data. Relational database systems made it possible for developers to write much simpler, declarative queries to accomplish the same tasks. Physical system details such as indexes and clustering were hidden by the relational model, enabling developers to focus first on the logical tasks at hand.

Developers of data-centric applications face a new and different barrier today. Relational databases have been so successful that there are many of them available (e.g., Oracle, DB2, SQL Server, and MySQL, to name a few of the most prominent ones). A typical enterprise is likely to have a number of relational databases within its corporate walls, and information about key business entities such as Customers or Employees is likely to reside in at least several of these systems. In addition, much of the information, even if it is stored relationally, will be relationally inaccessible because it is under tight application control. Access to application controlled information must come through the application APIs, as they enforce the rules and logic of the “business objects” of the application. As a result, enterprise data-integration application developers currently face a huge integration challenge: a given business entity of interest is now likely to reside in a mix of relational databases, packaged applications, and perhaps even in files or in legacy mainframe systems and/or applications. When a new, “composite” data-centric application needs to be created from these parts, we have to come back to procedural programming. This time, however, it is procedural programming against a wide variety of different subsystems, APIs, and data formats – essentially hand-coding what amounts to a distributed query or update plan. This situation has undermined the enterprise’s ability to respond to rapidly changing business requirements.

To face this new challenge, a recent trend has been to employ the SOA principle for data integration in modern enterprises (Spiess, Karnouskos, Savio, & Baecker, 2009; Janner, Siebeck, Schroth, & Hoyer, 2009). Data is provided to data integration application developers as Data Services (a.k.a., DaaSs: Data-as-a-Service) that access and manipulate transparently data pieces (about the same business object) at the different data sources (Gilpin, Yuhanna, Leganza, Heffner, Hoppermann, & Smillie, 2007; Carey, 2006; Truong & Dustdar, 2009; Dan, Johnson, & Arsanjani, 2007). That is, data services are a new class of services that sits between the enterprise application developers and the enterprise’s heterogeneous data sources. They provide a well-documented, platform (and data source) independent, interoperable and uniform method of interacting with data. They shield the enterprise application developers from having to directly interact with the various data sources that give access to business objects (i.e., Customers, Orders, Invoices, etc.), thus enabling them to focus on the business logic only.

Key Terms in this Chapter

Service Composition: Web service composition is the process of combining outsourced Web services to offer value-added services on top of existing ones. A comprehensive solution to Web service composition will need to address the service discovery issue as well (where Web service discovery is the process of finding Web services with a given capability).

Data Mashup (a.k.a., Information Mashup): Is a special class of applications mashup that is concerned with combining information from several data sources to construct value added information for business needs. Access to data sources is often carried out through Web services (this type of services is known as DaaS (Data-as-a-Service) Web services).

Mashup: A mashup is a Web application that integrates data, computation and UI elements provided by several components/applications to create new applications “on-the-fly”..” The concept of mashups has originated from the understanding that the number of applications available on the Web is growing very rapidly, and so is the need to combine them to meet user requirements. The Web site is an example of a site that “ mashes-up ” two other Web sites: CraigsList and Google Maps ; the site takes housing information from CraigsList and displays them on Google’s maps.

Ontology: An ontology is a formal and explicit specification of a shared conceptualization. “Conceptualization” refers to an abstraction of a domain that identifies the relevant concepts in that domain. “Shared” means that an ontology captures consensual knowledge. An ontology typically consists of a hierarchical description of important concepts in a domain, along with descriptions of the properties of each concept.

Data Services (i.e., Data-as-a-Service): A data service (i.e., DaaS service) provides a simplified, integrated view of real-time, high-quality information about a specific business entity, such as a customer or product. It can be provided by middleware or packaged as an individual software component. The information that it provides comes from a diverse set of information resources, including operational systems, operational data stores, data warehouses, content repositories, collaboration stores, and even streaming sources in advanced cases.

Privacy: Privacy can be defined as the right of individuals to determine for themselves when , how and to what extent information about them is communicated to others.

Query Rewriting (a.k.a. Query Reformulation): Is a well developed concept in the data integration field and refers to the process of translating a user query expressed over a mediated schema into a query over the underlying data sources. The mediated schema itself is virtual; i.e., it does not contain any data. That is why queries that are posed by the user over that schema, have to be translated to queries over the real data sources.

Complete Chapter List

Search this Book: