Semantic Integration for Research Environments

Semantic Integration for Research Environments

Tomasz Gubala (University of Amsterdam, The Netherlands and ACC CYFRONET AGH, Poland), Marian Bubak (Institute of Computer Science AGH, Poland) and Peter Sloot (University of Amsterdam, The Netherlands)
DOI: 10.4018/978-1-60566-374-6.ch026
OnDemand PDF Download:
List Price: $37.50


Research environments for modern, cross-disciplinary scientific endeavors have to unite multiple users, with varying levels of expertise and roles, along with multitudes of data sources and processing units. The high level of required integration contrasts with the loosely-coupled nature of environments which are appropriate for research. The problem is to support integration of dynamic service-based infrastructures with data sources, tools and users in a way that conserves ubiquity, extensibility and usability. This chapter presents a close examination of related achievements in the field and the description of proposed approach. It shows that integration of loosely-coupled system components with formallydefined vocabularies may fulfill the listed requirements. The authors demonstrate that combining formal representations of domain knowledge with techniques like data integration, semantic annotations and shared vocabularies, enables the development of systems for modern e-Science. For demonstration they present how several semantically-augmented experiments are modeled in the ViroLab virtual laboratory for virology.
Chapter Preview


Modern research in life sciences that aims at understanding system-level phenomena imposes serious requirements on computer systems designed to support researchers. The growing complexity of scientific issues, combined with the collaborative aspects of virtual research environments, demands that developed systems be more aware of the modeled knowledge domain than at any point in the past. As a response, the virtualization and enhancement of human activities with new capabilities offered by cyberinfrastructure, becomes visible in life sciences, particularly in the construction of modern virtual laboratories, problem-solving environments and other types of systems for e-Science. Just like traditional scientific research facilities, such solutions aim at bringing together researchers in certain domains to achieve breakthrough discoveries (Foster, 2006). They also involve equipment and enable the users of this equipment to collaborate using a common language: definitions, terms and procedures. However, in contrast to traditional research facilities, modern virtual laboratories are geographically dispersed and usually integrate people at different institutions, with varying levels of technical and scientific expertise, working simultaneously on collaborative applications.

There is an obvious need for a mechanism that could maintain the integration of human knowledge, tools, data, tasks and results on a level that is understandable for both humans and computer systems. However, the problem is that current approaches to integration are not flexible enough to support loosely-coupled, dynamically-changing environments with multi-type users (i.e. numerous users performing different tasks on the same infrastructure) in a way that preserves a crucial set of features (Gil, 2007). The required features of such an integration mechanism are:

  • 1.

    Ubiquity, since all resources form a single laboratory in the minds of its users;

  • 2.

    Extensibility, as both users and their tools evolve together with new scientific challenges;

  • 3.

    Friendliness for users who would rather communicate with systems using terms of their own domain of science.

In this chapter, basing on a thorough revision of related research endeavors and on our own experiences in the life sciences domain, we show that a system design methodology based heavily on domain semantics may provide a sound tool for building and operating modern e-Science systems. Moreover, we argue that such a system would support easier content creation (domain-specific services, in-silico experiments, data sources).

The core idea is to apply domain knowledge as an omnipresent integration glue that not only helps users interact and understand the system, but also (and this is where the novelty lies) interweaves interaction of distinct system modules. We consider a semantic layer, integrated with a system on the middleware level. As (Gardner, 2005, p.1003-1004) claims, a well devised set of ontologies may form a glue-like middle layer that integrates all possible data sources of a system (or of an organization, for that matter). The work also postulates (p.1007) that this kind of semantic-based middleware could be the future of knowledge-driven organizations (such as pharmaceutical companies) which try to overcome their intensive R&D data growth. We argue that a very similar approach may be successfully extended not only to data-related assets of an organization, but also to services, tools, messages, user interfaces and any elements of IT systems which deal with semantically-enriched data.

The innovation we propose comes as an evolution rather than a revolution of the general trend in semantic technologies. A thorough survey, executed by (Cardoso, 2007), indicates arising interest in semantic (web) technologies from early adopters, both in academia and industry. The entire field of ontological modeling and semantic systems integration is already shifting its focus from research to real market applications. As will be shown in this chapter, life sciences are no exception to this trend and are, in fact, poised to assume a leading role in the uptake of semantic technologies.

Key Terms in this Chapter

Semantics Injection: is any process that aims at attaching some knowledge to artificial entities modeled by some computer system. The knowledge is usually a piece of metadata that relates a virtual element inside the system to its real counterpart in the modeled domain of human knowledge.

Semantic (Resource) Discovery: is a process where some entities (documents, web content, services) are browsed or searched with specific consideration of their meaning in a certain domain of human knowledge. For instance, one may semantically search for “annual management report file” (in contrast to searching for “.doc file” which is a pure syntax-based search).

Collaboratory: is a short, handy name for “collaborative laboratory” and means a virtual location or a system where scientific experimental research is performed in a collaborative manner i.e. involving multiple individuals in one experiment.

Semantic System Integration: is used to descriptively express the methodology of building computer systems where the meaning of modeled entities (data, components, documents, actions etc.) has a significant impact on the way the realized system behaves and operates (also internally).

Semantic Web: is a wide-ranging term describing one of the possible future steps of evolution of the World Wide Web. Though definitions of Semantic Web vary, they usually involve enriching documents and tools that form the web with domain-related information so their semantics is not only obvious to human users but is also comprehensible for other computer programs.

Semantic Annotation: is a piece of (structured or not) human-readable information (which can also be computer-readable) that (1) constitutes a metadata description of the annotation subject and (2) relates to the knowledge domain of the reality modeled by the system (Bechhofer, 2002).

Semantic Data Integration: is a mechanism which associates different sources of data on the basis of the meaning of data content. This is usually applied to merging (logically or physically) the content of different (distributed) data sources so that an end user may use all the sources through some unified mechanism. This mechanism sometimes also tackles the problem of data source heterogeneity.

Annotation Economy: is freely used to denote the category which gathers together both producers (be it human or artificial) and consumers (also of both types) of broadly-understood annotations. The term is introduced to describe the future state of knowledge enriched computer systems where annotations, as entities that carry crucial information, will play a first-class role in building and operating systems. (Kiryakov, 2004, p.50)

Task-Oriented Design: is a descriptive term for a methodology of building computer systems which stresses the various tasks that users would like to perform inside the systems and then follows this separation during the development phase. While task-oriented design is not formally defined, the term is typically used to describe the approach itself, rather that any specific kind of design methodology.

Ontology: is a formalized mean to express some part of reality in a well-organized manner. One of the most important qualities of ontologies is the fact they are legible for artificial entities like computer programs. Since this formalization brings inevitable simplification and classification of a part of the modeled domain, we say ontologies are a form of abstraction through conceptualization.

Complete Chapter List

Search this Book: