Digital libraries are systems that contain organized collections of objects, serving in their most basic functions as a mirror of the traditional library that contains paper documents. Most of the information contained in the collections of a digital library consists of documents, which can evolve with time. That is, a document can be modified to obtain a new document, and digital library users may want access to any of those versions. This introduces in digital libraries the problem of versioning, a problem that is also of interest for the hypertext community and the Semantic Web community. Some domains in which document evolution is a very important issue are the legislative domain (Arnold-Moore, 1997; Martínez González, de la Fuente, Derniame & Pedrero, 2003a; Vitali, 1999), the management of errata made to scientific articles (Poworotznek, 2003), software construction (Conradi & Westfechtel, 1998), and collaborative e-learning (Brooks, Cooke & Vassileva, 2003).
As for the issues of interest related to document versions, we distinguish seven categories:
What can be versioned?
This question can be considered from two perspectives. The first perspective considers objects stored in the system as atomic units of information, which cannot suffer partial changes. This is the typical situation in the Web and hypertext environments. Hypertext nodes (documents, files, others) can change (be substituted, deleted, inserted), and the hypertext structure can also change (objects may vary their location, some of them may disappear, others may change their references to other objects), but each document is considered an atomic item which is not subdivided in other objects: changes always concern the whole document. The evolution considered in the second perspective is the one of the documents used by digital library users --these documents may or may not match unidirectionally any of the objects stored in the digital library (Arms, 1997)—and with XML documents. Changes in this case can be related with any component of a document: its content, part of it (e.g., some nodes in XML documents), the internal structure of documents, or references (citations within documents, that are part of a document).
Key Terms in this Chapter
XQuery: XML Query Language. It is a W3C Recommendation.
XSLT: XSL Transformations. A transformation language for XML documents. It permits rules for transforming a source tree into a result tree to be expressed. It is a W3C Recommendation.
Hypertext: The organization of information units as a network of associations, which a user can choose to resolve. Hypertext links are the instances of such associations
Digital Library: A set of electronic documents organized in collections, plus the system that provides access to them. They are the digital version of traditional libraries
Version Control: Set of mechanisms that support object evolution in computer applications.
Referential Integrity: In hypertext, a measure of the reliability of a reference to its endpoints. A reference has the property of referential integrity if it is always possible to resolve it. When references are represented as links it is called ‘link integrity’
Versions: Variations of an object with a high degree of similarity. Document versions are never completely equal, but they are similar enough so as to be able to recognise them as the same document
XML: Extensible Markup Language. Markup language for structured documents. Structure is represented with textual markup that intermixes with document content. XML is a recommendation from the World Wide Web Consortium (W3C).
Multiversion XML Document: An XML document in which each node can have several branches that correspond to its different versions.