Files are Siles: Extending File Systems with Semantic Annotations

Files are Siles: Extending File Systems with Semantic Annotations

Bernhard Schandl (University of Vienna, Austria) and Bernhard Haslhofer (University of Vienna, Austria)
Copyright: © 2010 |Pages: 21
DOI: 10.4018/jswis.2010070101
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

With the increasing storage capacity of personal computing devices, the problems of information overload and information fragmentation are apparent on users’ desktops. For the Web, semantic technologies solve this problem by adding a machine-interpretable information layer on top of existing resources. It has been shown that the application of these technologies to desktop environments is helpful for end users. However, certain characteristics of the Semantic Web architecture that are commonly accepted in the Web context are not desirable for desktops. To overcome these limitations, the authors propose the sile model, which combines characteristics of the Semantic Web and file systems. This model is a conceptual foundation for the Semantic Desktop and serves as underlying infrastructure on which applications and further services can be built. The authors present one service, a virtual file system based on siles, which allows users to semantically annotate files and directories and keeps full compatibility to traditional hierarchical file systems. The authors also discuss how Semantic Web vocabularies can be applied for meaningful annotation of files and present a prototypical implementation of the model and analyze the performance of typical access operations, both on the file system and metadata level.
Article Preview

Introduction

Large amounts of information are stored on personal desktops. We use our personal computing devices—both mobile and stationary—to communicate, to write documents, to organize multimedia content, to search for and retrieve information, and much more. With the increasing computing and storage power of such devices, we face the problem of information overload: the amount of data we generate and consume is permanently increasing, and because of the availability of cheap storage space, each and every bit of information is stored. Another problem is even more prevalent on the desktop than on the Web: information fragmentation. Data of different kinds are stored in heterogeneous silos, and—contrary to the Web, where hyperlinks can be defined between documents and across site boundaries—there exist only limited means to define and retrieve relationships between different desktop resources. In the best case such relationships can be represented using additional infrastructure (e.g., relational databases or specific applications), but these are usually not tightly integrated with file systems.

The Semantic Web aims to deal with the problems mentioned before by adding a layer on top of the existing Web infrastructure, wherein descriptions about web resources are expressed using the Resource Description Framework (RDF) using commonly accepted vocabularies or ontologies. This allows machines to interpret the published data and thus helps end users to find information more efficiently. A large number of data sets1 and vocabularies2 have already been published and form a solid data corpus that can be indexed by (semantic) search engines and serves as foundation for applications.

Recent research in the field of the Semantic Desktop (Blunschi et al., 2007; Groza et al., 2007; Karger, 2007) has shown that a number of features provided by Semantic Web technologies are also suitable for the problem of information management on the desktop; especially, the provision of unified identifiers, the ability to represent data in an application-independent generic format, the flexibility to describe resources using formalized vocabularies, and the possibility to reason over these descriptions. It has also been shown (Sauermann & Heim, 2008; Franz et al., 2009) that the inclusion of semantic technologies on the desktop can significantly improve the user’s perceived quality of personal information management, especially when they are applied during a longer time period.

However, there exist some significant conceptual differences between the Web and the desktop. First, in contrast to the World Wide Web, the desktop already has a well-established organization metaphor for data: file systems, which have been in use for decades. In consequence, the vast majority of personal information are stored in files, which are organized using hierarchical, labeled collections (folders or directories) or, to a far more limited extent, using metadata attached to or encoded within files. Therefore it is crucial for the Semantic Desktop to smoothly integrate with file systems in a way that allows for the annotation of files without breaking the behavior of existing desktop applications. A second major difference is the handling of broken links. While appearing and disappearing web resources are—to a certain extent—accepted on the Web, users rightfully expect their data on the desktop to remain consistent over time.

Since the RDF data model exposes a number of shortcomings that may cause problems for an efficient implementation of the Semantic Desktop, we propose the sile model, a data model that acts as an intermediate and integrative layer between file systems and Semantic Web technologies. This model allows users and applications to annotate and inter-relate file-like desktop resources. It is designed as an infrastructure on which applications and services can be built. One example of such a service, a virtual file system, is presented in this paper. Through this virtual file representation, the sile model can be used as a hierarchical file system and thus maintains full backwards-compatibility to existing systems and applications.

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing