New Trends in Graph Mining: Structural and Node-Colored Network Motifs

New Trends in Graph Mining: Structural and Node-Colored Network Motifs

Francesco Bruno (DEIS, Università della Calabria, Italy), Luigi Palopoli (DEIS, Università della Calabria, Italy) and Simona E. Rombo (DEIS, Università della Calabria, Italy)
Copyright: © 2010 |Pages: 19
DOI: 10.4018/jkdb.2010100206
OnDemand PDF Download:
List Price: $37.50


Searching for repeated features characterizing biological data is fundamental in computational biology. When biological networks are under analysis, the presence of repeated modules across the same network (or several distinct ones) is shown to be very relevant. Indeed, several studies prove that biological networks can be often understood in terms of coalitions of basic repeated building blocks, often referred to as network motifs.This work provides a review of the main techniques proposed for motif extraction from biological networks. In particular, main intrinsic difficulties related to the problem are pointed out, along with solutions proposed in the literature to overcome them. Open challenges and directions for future research are finally discussed.
Article Preview


Recent years have witnessed the harvesting of enormous amounts of “rough” biological information which need to be interpreted. Such information include both string-shaped (e.g., nucleic and protein sequences) and complex-structured data, such as protein interaction networks and protein/RNA structures.

One of the main features characterizing biological data is the presence of regularities and repetitions, often associated to common biological functions and/or properties shared by the cellular components of different organisms. That is, biological data is characterized by the presence of “syntactic” features modeling “semantic” properties common to the associated biological components. Such features become interesting when some particular conditions, related to the specific problem under analysis, are satisfied. For example, subsequences repeated more frequently than a fixed threshold in a set of protein sequences may be considered interesting, since this probably denotes that those subsequences embody some specific biological function common to the proteins in the given set.

When the repeated syntactic structures are worth being considered interesting, they are generally called motifs for the biological data in which they repeatedly occur, and it is usually important to look for them and single them out.

The problem of motif discovery has been thoroughly addressed during the last few years, specifically in the context of analyzing string-shaped data, and many important results have been obtained (see, e.g., (Brazma et al. 1998; Rombo et al., 2007) for surveys on this topic). Motif extraction from complex biological structures is relevant from the application viewpoint as well (Apostolico et al., 2008b; Apostolico et al. 2008; Cheng et al. 2008), as further discussed below. Indeed, following the huge efforts paid towards completing the genome coding of several organisms, a large deal of attention is now turning towards the analysis of how cellular components interact in order for biological functions to be realized (von Mering et al., 2002; Miller et al., 2005). Interactions among cellular components are most often modeled by biological networks (Alm et al., 2002), where the nodes represent components (e.g., proteins, genes, enzymes) and edges among nodes denote interactions amongst corresponding components. Different types of biological networks are currently available for analysis purposes, such as protein-protein interaction networks, gene regulatory networks and metabolic networks. Also, several studies (Cheng et al., 2008; Mangan et al., 2005; Milo et al., 2002; Shen-Orr et al., 2002) lead to the important observation that, similarly as with sequences, biological networks can be often understood in terms of basic repeated building blocks that are network motifs. And, for this reason, several techniques have been recently devised to single out motifs of biological networks. The aim of this paper is precisely that of providing a review on the main approaches to network motif extraction presented in the literature.

Preliminarily, it is worth noting that, technically speaking, the problem at hand presents several intrinsic difficulties such as, for instance, the need to deal with (variants of) the sub-graph isomorphism problem (a well known NP-hard problem (Garey et al., 1979)) when searching for identical or quasi-identical sub-graphs within several input graphs. In this respect, it will be interesting to point out how such difficulties have been dealt with in the context of the various proposed techniques. Also, different kinds of network motifs may be defined, referring to both network topology and node similarities, which is conducive to different approaches to motif extraction problems. This issue will also be highlighted in the following.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 7: 2 Issues (2017)
Volume 6: 2 Issues (2016)
Volume 5: 2 Issues (2015)
Volume 4: 2 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing