Formal Framework of XML Document Schema Design

Formal Framework of XML Document Schema Design

Zurinahni Zainol (University of Hull, UK, & Universiti Sains Malaysia, Malaysia) and Bing Wang (University of Hull, UK)
Copyright: © 2012 |Pages: 44
DOI: 10.4018/ijirr.2012010103


Designing “good” XML documents is a very difficult task for a database designer. Although many theories for XML database design have proposed, none of commercial design tool for XML document design has been developed to assist the XML document designer. In this paper, the authors present a formal framework of XML document design by incorporating a conceptual model of XML schema called Graph-Document Type Definition (G-DTD) with a theory of database normalization. This framework is designed as a blueprint to help the XML database designers to perform the XML document schema design quickly and accurately. The G-DTD is used to describe the structure of XML documents at the schema level. A set of normal forms for G-DTD on the basis of rules proposed by Arenas and Libkin and Lv. et al is used to provide a guideline to a well-designed schema for XML documents. They develop a prototype of XML document schema design using a Z formal specification language. Finally, using a case study, this formal specification is validated to check for correctness and consistency of the specification. Thus, this gives a confidence that the authors’ prototype can be implemented successfully to generate an automatic XML document design.
Article Preview

1. Introduction

With the wide utilization of the web and the availability of a huge amount of electronic data, XML (eXtensible Markup Language) has been used as a standard means of information representation and exchange over the Web. It usage has increased extensively in many commercial applications with complex data structures such as Manufacturing, Bioinformatics, B2B (Business to Business), Medicine and Geographical data (Powell, 2007; Ma & Yan, 2007; Pankowski, 2009). Thus, effective means of the management of XML documents as databases are needed for query, consistent and efficient storage. Various databases, including relational, object-oriented, and object-relational databases have been used for mapping to and from XML documents (Florecsu & Kossmann, 1999; Runapongsa & Patel, 2002). Among this kind of database, most researchers use a relational database as a persistent storage since it is a more promising alternative, because of its maturity. However, this approach has disadvantages, since it does not support well complex data structures such as scientific data because it cannot retain the original of XML documents (Bourret, 2007). With such problem, has led to the development of native XML database system for a number of applications and its use is increasingly rapidly because its ability to hold and manage highly complex data structures (Bourret, 2007; Kohler, 2004; Lee et al., 2010). Such applications may use native XML database facilities (Kanne & Moerkotte, 2000) to store and update XML data (Tatarinov et al., 2001). The native XML database stores XML documents directly without performing any conversion or reformatting the XML documents into another format thus reduce processing time and provide better performance. However, native XML database is still in its infancy and not as mature as traditional databases (e.g., relational database), hence many important problems and questions remain unanswered, especially on the principles of XML database design (Arenas, 2006; Schewe, 2005; Libkin, 2007).

It is important to design good XML document for the sake of readability and manageability. The good design means there are no duplicate information, store correct and complete information. This is because duplicate information (also called redundant data) will wastes space and increases the likelihood of errors and inconsistencies. While the correctness and completeness of information is important because, if the document contains incorrect information, any reports that pull information from the document will also contain incorrect information. As a result, any decisions made based on those reports will then be misinformed. Like managing traditional database, the management of XML documents requires capabilities to handle with integrity, consistency, data dependency, redundancy, views, access rights, integration, and normal forms (Yu & Jagadish, 2008; Libkin, 2007; Arenas & Libkin, 2004; Dobbie, 2000; Feng et al., 2002). Amongst the important problem related to XML document design are data redundancies and update anomalies.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 9: 4 Issues (2019): 2 Released, 2 Forthcoming
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing