Article Preview
Top1. Introduction
The field of biomedical research has recently seen a vast growth in publicly available biomedical resources, including multiple types of datasets and databases, and thus statistical methodologies and analyses tools. A major advance is that now researchers have access to complementary views of a single organism by analyzing multiple types of data, including whole genome sequencing, expression profiling and other high-throughput experiments. Those data, which are often called ‘-omics’ data, include the genome sequencing data (genomics), the complete set of RNA transcripts produced by the genome and analysed via microarray, Real-time PCR or Next-Generation Sequencing platforms (transcriptomics), protein structures and function (proteomics), or any other data available for the organism under study, and provide novel views of cellular components in the biological systems (Tsiliki & Kossida, 2011). As a consequence, an enormous amount of digital content is produced everyday (i.e. information that is created, captured, or replicated in digital form as well as hundreds of analysis systems), resulting in high rates of new information being distributed and demanding attention (Karacapilidis, Tzagarakis, Christodoulou, & Tsiliki, 2012).
Most of those data sets are well organised in publicly available databases, although there are existing limitations in accessing, storing, mapping and managing the increasing amount of data available (Sullivan, Gabbard, Shukla, & Sobral, 2010), which could be overridden when supported by appropriate algorithmic analysis and software tools (Koschmieder, Zimmermann, Tribl, Stoltmann, & Leser, 2011). For instance, cloud and distributed computing, schema-free solutions, domain-specific and process-oriented programming languages or special statistical algorithmic solutions can be applied (Huttenhower, Schroeder, Chikina, & Troyanskaya, 2008; Pennisi, 2011; Baker, 2012). Apart from meaningfully mapping, emphasis has been given to algorithmically unify the data above (Lukk et al., 2010) and their supplementary views (Joyce & Palsson, 2006). However, given the biological or statistical question of interest, choosing the right datasets, databases and tools for a given project is difficult even for an expert (Pennisi, 2011).
Within this environment, research in biostatistics and biomedical fields has become increasingly multidisciplinary and collaborative in nature (Lee, 2007; Baker, 2012). The progressively specialized resources show that the way forward is to form collaboration teams in order to address complex research questions. Such multidisciplinary teams would better meet challenges relative to various problems such as how to store, access, analyze and integrate multiple types of data (Pennisi, 2011); or, how to work with multiple databases simultaneously (Finholt, 2003); or even, how to make data accessible and usable to life sciences researchers (Sullivan, Gabbard, Shukla, & Sobral, 2010). In addition, tools facilitating sense- and decision-making by appropriately capturing the collective intelligence that emerges during such collaboration are lacking. Biomedical researchers need such tools to efficiently and effectively collaborate and make decisions by appropriately assembling and analyzing enormous volumes of complex multi-faceted data residing in different sources. Supporting team collaboration under such circumstances is still considered as a challenging task (Spencer, Zimmerman, & Abramson, 2011). Towards this goal, we present a Web-based approach to support communities of bio-scientists during their scientific collaboration, which is being developed in the context of the Dicode FP7 EU ongoing project (http://dicode-project.eu/).