A Generic Model for Universal Data Storage and Conversion and Its Web Based Prototypical Implementation

A Generic Model for Universal Data Storage and Conversion and Its Web Based Prototypical Implementation

Andreas Unterweger (Salzburg University of Applied Sciences, Austria), Bernadette Himmelbauer (Salzburg University of Applied Sciences, Austria), Simon Kranzer (Salzburg University of Applied Sciences, Austria), Peter Ott (Salzburg University of Applied Sciences, Austria), Robert Merz (Salzburg University of Applied Sciences, Austria) and Gerhard Jöchtl (Salzburg University of Applied Sciences, Austria)
DOI: 10.4018/jitwe.2012010105
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

This paper presents a prototypical, Web based data conversion framework and its underlying data representation principles which allow conversions from and to any data format. Therefore, a data model is proposed which allows storing values of arbitrary types, including inter-data dependencies and meta information. Furthermore, an Extensible Markup Language (XML) based model to describe data formats is provided which allows specifying programs to convert data represented in existing formats both from and to the proposed data model. It will be shown that these programs are Turing complete, thus allowing the same arbitrarily complex conversions which are possible with Extensible Stylesheet Language Transformations (XSLT) or the C programming language. Finally, the components of a prototypical Web based implementation in form of a validator, a data converter and a data generator are described. In combination with a data editor, parts of this prototypical implementation are already employed in several use cases in the industry and other research projects to transform data between different formats.
Article Preview

Introduction

Whenever huge quantities of information from industrial applications need to be stored, transformation and manipulation of data is a complex issue. Vast amount of data combined with proprietary data formats, which are generated from different data sources, pose a great challenge for data handling. To address this issue, we present a fully integrated solution which enables storage and transformation of arbitrary data formats.

Our approach is based on two essential models: a generic data model and an XML model. The generic data model allows storage of arbitrary data formats, including meta information and interdependencies. The XML model specifies data formats and serves as transformation language from and to the original data representations. In addition, a universal data converter and a generator are implemented in a prototype framework. Figure 1 shows our complete conversion approach. Based on the XML model, a program for transformation only needs to be specified once for a particular data format. The same model will be used for both conversion and generation.

Figure 1.

Overview of the data conversion and generation framework based on the proposed data and XML model. Data available in different formats is converted to the data model representation using the data converter which transforms data based on a program specified by the XML model. The data generator enables the export into arbitrary formats.

Similar to our XML model, a transformation language has been realized in XSLT by the W3C (2007). While both, their and our approach, are Turing complete (Brainerd & Landweber, 1974) and allow the conversion of arbitrarily complex data formats, the XSLT version requires a different data format specification for each transformation direction (input and output). Furthermore, the XSLT implementation does not specify how storage inside the database is handled, while in our approach storage structures are explicitly defined by the generic data model. biXid, a bidirectional transformation language, is shown by Kawanaka and Hosoya (2006) and designed for the transformation of XML formats only, as is XMLTrans proposed by Walker, Petitpierre, and Armstrong (2000). Another related language targeting model transformation is DSLTrans, proposed by Barroca, Lucio, Amaral, Felix, and Sousa (2010), which however, is Turing incomplete.

In addition to the lack of a generic data model (see XSLT above), parser generators like ANTLR (Parr, 2007) cannot be used for text generation, while text processing software like awk (IEEE, 2004) or commercial data converters like Altova MapForce (Altova, 2011) impose limits on data formats and/or the complexity of the conversion. As this makes them and all similar solutions either non-universal or not as flexible as our approach, they are not reviewed in detail.

The paper is structured as follows: first, the XML model as the core part of our approach which allows for data format description and conversion is explained. After showing the Turing completeness of said XML model in the subsequent section, the generic data model for storing data values, their dependencies and the corresponding meta information is described. The Web based prototypical implementation of a universal data converter and data generator based on the two models, i.e. the data and the XML model, is described in the “Implementation” section.

The Xml Model

In order to specify the format of data to be parsed or generated, we developed an XML model, implemented in form of an XML schema (W3C, 2004) whose complexity is sufficient to model any computer program as shown in the subsequent section. By design, programs specified by the model, i.e. data format descriptions, are capable of both parsing and data generation. As a result, only one description of the data format is required for specification and allows both reading as well as writing data of this format.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 13: 4 Issues (2018): 1 Released, 3 Forthcoming
Volume 12: 4 Issues (2017)
Volume 11: 4 Issues (2016)
Volume 10: 4 Issues (2015)
Volume 9: 4 Issues (2014)
Volume 8: 4 Issues (2013)
Volume 7: 4 Issues (2012)
Volume 6: 4 Issues (2011)
Volume 5: 4 Issues (2010)
Volume 4: 4 Issues (2009)
Volume 3: 4 Issues (2008)
Volume 2: 4 Issues (2007)
Volume 1: 4 Issues (2006)
View Complete Journal Contents Listing