It is well known that XML has been widely adopted for its flexible and self-describing nature. However, relational data will continue to co-exist with XML for several different reasons one of which is the high cost of transferring everything to XML. In this context, data designers face the problem of modeling both relational and XML data within an integrated environment. This chapter highlights important questions on hybrid XML-relational database design and discusses use cases, requirements, and deficiencies in existing design methodologies especially in the light of data and schema evolution. The authors’ analysis results in several design guidelines and a series of challenges to be addressed by future research.
Enterprise data design has become much more complex than modeling traditional data stores. The data flowing in and out of an enterprise is no longer just relational tuples, but also XML data in the form of messages and business artifacts such as purchase orders, invoices, contracts and other documents. Moreover, regulations (such as the Sarbanes Oxley Act1) require much of these data (both relational and XML) to be versioned and persisted for audit trail. Last but not least, the competitiveness of enterprises is often a function of their business agility – the ability to change with the changing market. Consequently, enterprise data design needs to cope with different types of data, changing data and data schema evolution.
Relational database management systems (RDBMSs) are a dominant technology for managing enterprise data stores. Even if the enterprise data are more suitably managed as XML, the cost of migrating to XML databases may be prohibitive. Therefore, relational data will continue to persist in the database. On the other hand, the widespread use of XML data requires the ability to manage and retrieve XML information. A simple solution is to store XML data as character large objects (CLOBs) in an RDBMS, but query processing is inefficient due to per query parsing of the XML CLOBs. Another solution, adopted by most commercial RDBMSs, is shredding XML data into relational tables, for example Florescu & Kossmann (1999) and Shanmugasundaram (2001). However, shredding does not handle XML schema changes efficiently. Hence, a native XML database that stores XML data in a hierarchical format is still required. Such specialized native XML databases have been developed, for example Jagadish (2002), and some even support relational data as well, for example Halverson (2004).
Nevertheless, neither a pure relational nor a pure XML database meets all the needs of enterprise data management. Ideally, a hybrid database that supports both relational and XML is the best solution to model, persist, manage, and query both relational and XML data in a unified manner. Some commercial RDBMSs have begun to support such hybrid XML-relational data models (e.g. IBM’s DB2 v.92). Although employing a hybrid solution seems to be a straightforward idea, in reality, it involves a complex system with a many options that may easily confuse most designers. Likewise, we noticed that most users are still uncertain about how exactly to model an XML database, not to mention a hybrid XML-relational one.
In this context, the focus of this chapter is to discuss how to design a hybrid XML-relational database. Note that we are not concerned with designing a database system, but rather a set of relations containing relational and XML data. The contributions and the organization of this chapter are as follows.
We present a methodology for designing XML databases (without considering any interaction with relational data).
We overview some of the most relevant real case scenarios that motivate the relevance of a hybrid XML-relational database.
We present and discuss the challenges to defining a hybrid XML-relational model. We present a set of modeling ideas that serve as an initial solution for such complex modeling issues. Also, we discuss what else is needed in order to have a more complete solution – i.e., we discuss open issues on the modeling phase.
Finally, we discuss some related work and conclude this chapter with an overview of open problems.Top
This section presents a brief review of relational database design, which we assume is well-known in the computer science community. Traditionally, the design of relational databases is structured into three phases as follows.