Article Preview
TopIntroduction
Biological Problem Solving Environments (PSEs) are frequently provided as web applications. In the trivial case, such a PSE provides a single service or a limited set of services as a web application. Biologists “connect” these services manually by cutting and pasting data between web sites. Since this process is error-prone and hard to retrace, a plethora of different bioinformatics workflow systems have emerged (Taylor, Deelman, Gannon, & Shields, 2006; (Tiwari, & Sekhar, 2007). Increasingly, web portals are used to access and to share scientific workflows (De Roure, Goble, & Stevens, 2007; Christie & Marru, 2007). Some portals provide workflow construction facilities via Java Web Start (Christie & Marru, 2007; Sipos & Kacsuk, 2005), while true web-based workflow construction tools lack sophisticated user experience (Carrere & Gouzy, 2006; Bartocci, Corradini, Merelli, & Scortichini, 2007). Often, scientific workflow management systems are provided as desktop applications with rather complex user interfaces (Oinn et al., 2004; Shah et al., 2004).
Similar to programming languages, the complexity of the user interface and the workflow notation remains an area of conflict. Supporting a large set of different workflows and different types of services increases the complexity of a workflow language, thus reducing usability by domain experts. In contrast, high user-friendliness limits the possible set of workflow patterns and access to arbitrary services. Domain experts should be able to compose and execute workflows in a domain specific modeling system, and still fall back to a collaboration with software engineers if a more complex workflow model is needed. In this paper we propose a hybrid approach of augmenting an existing collaborative workflow development system with a biology-specific scientific workflow mode. In software engineering it is often recognized that 80% of the useful functionality is provided by 20% of the code (the “Pareto principle”).
We argue that many typical life science workflows can be expressed by a reduced workflow notation. Similar to high-level programming languages, which are compiled to a common machine language, we regard the Business Process Execution Language (BPEL) as the “assembly language” of an arbitrary set of domain-specific workflow languages (OASIS, 2007).
In this article, we give an overview of our previous work on the bioinformatics workflow system MoBiFlow and the software ecosystem on which it is built. The article extends our workshop paper (Küchlin & Held, 2010). MoBiFlow consists of the biology-specific workflow system Calvin and the collaborative workflow development system Hobbes, with additional tools such as the computation of workflow metrics, layout capabilities or Web 2.0 based video conferencing.
In particular, we make the following contributions:
- 1.
We have formalized common e-science processes in biology as a meta-process and we derive requirements for next generation bioinformatics workflow systems.
- 2.
We show how low-level workflow languages can be combined with domain-specific workflow notations to provide ease-of-use and flexibility at the same time.
- 3.
We describe the design principles behind the MoBiFlow system, which represents a solution for high-level and low-level collaborative workflow development and usage.