Article Preview
TopIntroduction
Nowadays, several applications using XML repositories (e.g., banking, accounting, personnel management, airline reservations, weather monitoring and forecasting, e-government and e-commerce) are temporal in nature and require a full history of data and schema changes, which must be managed efficiently, consistently, and in a transparent way with regard to the end user. Notice that for generic temporal databases (Dyreson & Grandi, 2009), XML provides an excellent support for temporally grouped data models (Zaniolo & Wang, 2008), which have long been considered as the most natural and effective representations of temporal information (Clifford et al., 1995). Besides, schema versioning has long been advocated to be the more appropriate solution to support a complete data and schema history in databases (De Castro et al., 1997; Grandi, 2002).
In a temporal setting, XML data can evolve along transaction-time and/or valid-time; thus, they can have a transaction-time, a valid-time or a bitemporal format. When XML data of different temporal formats can coexist in the same XML repository, we talk about a multitemporal XML repository.
Whereas schema versioning is required by several applications using multitemporal XML repositories, both existing XML DBMS and XML tools do not provide support for that feature until now (Colazzo et al., 2010). Therefore, XML Schema designers and developers have to employ ad hoc methods to manage schema versioning.
In order to propose a general approach for schema versioning in multitemporal XML repositories, the possible choices were as follows: (i) to have different levels of schema specifications, that is a level for the data structure and one or more levels for temporal dimensions, and (ii) to push the possible multitemporality one level higher. In this context, “Which is the right way to consider XML documents sharing the same data structure and having different time dimensions?” could be a good question. Hence, we dealt with the problem to define the different levels we need, and to define the mappings between such levels.
After surveying the state of the art of (multi-)temporal XML data models supporting schema versioning, we concluded that the resulting overall framework could be not very dissimilar from the one introduced by Snodgrass and colleagues in (Currim et al., 2004; Dyreson et al., 2006; Snodgrass et al., 2008), named τXSchema. This latter is an infrastructure, composed of an XML schema language and a suite of tools, for constructing and validating temporal XML documents under schema versioning. The τXSchema language extends the XML Schema language (W3C, 2004) to explicitly support time-varying XML documents. τXSchema has a three-level architecture for specifying a schema for time-varying data. The first level is for the conventional schema which is a standard XML Schema document that describes the structure of a standard XML document, without any temporal aspect. The second level is for the logical annotations of the conventional schema, which identify which elements can vary over time. The third level is for the physical annotations of the conventional schema, which describe where timestamps should be placed and how the time-varying aspects should be represented.
Finally, we were in front of two options: either to extend the τXSchema approach or to propose a completely different approach. We have chosen the first one, for the reasons which follow.
- 1.
We came up with a similar requirement for having different levels for schema specification, so any alternative approach we could propose would not be so far from the τXSchema principles.
- 2.
In case we decide to move away from τXSchema, we must then be very convincing in justifying our choice (e.g., by highlighting strong limitations of the τXSchema approach which we need to overcome).
- 3.
The τXSchema approach is well known in the research community and thus it could be better to use it as a starting point, instead of putting forward a brand new proposal.
- 4.
In the τXSchema approach, there is room enough for extensions and, thus, we could define a set of schema changes and solve the semantics of change and change propagation problems for such operations on top of it.