Dealing with Structure Heterogeneity in Semantic Collaborative Information Systems

Eva Zangerle (University of Innsbruck, Austria) and Wolfgang Gassler (University of Innsbruck, Austria)
DOI: 10.4018/978-1-4666-0894-8.ch001


The creation of content within semistructured, collaborative information systems imposes the problem of having to deal with very heterogeneous schemata. This is due to the fact that the semistructured paradigm does not restrict the user in his choice of nomenclatures for the data he intends to store within the information system. As many users participate in the creation of data, the structure of this data is very heterogeneous. In this chapter the authors discuss two main movements that aim at dealing with heterogeneity. The first approach is concerned with efficiently avoiding structure heterogeneity within collaborative information systems by providing the users with suitable recommendations for an aligned schema during the insertion process. The second approach is mainly focussing on overcoming structure heterogeneity by providing efficient means for querying heterogeneous data.
Most online, collaborative information systems, such as wiki systems, provide means to easily add, modify and delete information, which does not have to adhere to any predefined schema or structure. In contrast, traditional (relational) databases are strictly-structured and enforce the user to store information in a predefined schema. Such structured data stores provide the big advantage of structured access, which enables complex query capabilities. Traditional wiki systems only support full-text search which is not feasible for complex queries such as “Which Austrian cities have more than 10.000 inhabitants and have a female mayor who has a doctoral degree?” Nevertheless, wiki systems are able to cope with very large amount of collaboratively created information with very heterogeneous structures and schemata.

Weikum et al. (2009) observed that modern information systems have to be able to support both structured and unstructured data to combine the advantages of both worlds and be able to answer such complex questions. This need of combination initiates the development of collaborative, semistructured information systems. They provide mechanisms for the combination of both unstructured and structured storage of data. Semistructured data features a structure without having to specify a fixed schema. As this paradigm does not restrict the user and the used schema at all, the massive collaborative creation and editing of content by hundreds or thousands of users obviously leads to the usage of very heterogeneous schemata and structures in collaborative environments. Even Wikipedia, which has a very committed community dealing with heterogeneity, is also not able to avoid heterogeneity within its schema.

In the following sections we discuss the problem of heterogeneity in semistructured information systems and show approaches which are able to deal with heterogeneous schemata, data and the collaborative paradigm of creating and managing knowledge and information.

