Article Preview
Top1. Introduction
With the wide utilization of the web and the availability of a huge amount of electronic data, XML (eXtensible Markup Language) has been used as a standard means of information representation and exchange over the Web. It usage has increased extensively in many commercial applications with complex data structures such as Manufacturing, Bioinformatics, B2B (Business to Business), Medicine and Geographical data (Powell, 2007; Ma & Yan, 2007; Pankowski, 2009). Thus, effective means of the management of XML documents as databases are needed for query, consistent and efficient storage. Various databases, including relational, object-oriented, and object-relational databases have been used for mapping to and from XML documents (Florecsu & Kossmann, 1999; Runapongsa & Patel, 2002). Among this kind of database, most researchers use a relational database as a persistent storage since it is a more promising alternative, because of its maturity. However, this approach has disadvantages, since it does not support well complex data structures such as scientific data because it cannot retain the original of XML documents (Bourret, 2007). With such problem, has led to the development of native XML database system for a number of applications and its use is increasingly rapidly because its ability to hold and manage highly complex data structures (Bourret, 2007; Kohler, 2004; Lee et al., 2010). Such applications may use native XML database facilities (Kanne & Moerkotte, 2000) to store and update XML data (Tatarinov et al., 2001). The native XML database stores XML documents directly without performing any conversion or reformatting the XML documents into another format thus reduce processing time and provide better performance. However, native XML database is still in its infancy and not as mature as traditional databases (e.g., relational database), hence many important problems and questions remain unanswered, especially on the principles of XML database design (Arenas, 2006; Schewe, 2005; Libkin, 2007).
It is important to design good XML document for the sake of readability and manageability. The good design means there are no duplicate information, store correct and complete information. This is because duplicate information (also called redundant data) will wastes space and increases the likelihood of errors and inconsistencies. While the correctness and completeness of information is important because, if the document contains incorrect information, any reports that pull information from the document will also contain incorrect information. As a result, any decisions made based on those reports will then be misinformed. Like managing traditional database, the management of XML documents requires capabilities to handle with integrity, consistency, data dependency, redundancy, views, access rights, integration, and normal forms (Yu & Jagadish, 2008; Libkin, 2007; Arenas & Libkin, 2004; Dobbie, 2000; Feng et al., 2002). Amongst the important problem related to XML document design are data redundancies and update anomalies.