New Trends in Graph Mining: Structural and Node-Colored Network Motifs

New Trends in Graph Mining: Structural and Node-Colored Network Motifs

Francesco Bruno, Luigi Palopoli, Simona E. Rombo
Copyright: © 2012 |Pages: 20
DOI: 10.4018/978-1-4666-1785-8.ch015
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Open challenges and directions for future research are finally discussed.
Chapter Preview
Top

Introduction

Recent years have witnessed the harvesting of enormous amounts of “rough” biological information which need to be interpreted. Such information include both string-shaped (e.g., nucleic and protein sequences) and complex-structured data, such as protein interaction networks and protein/RNA structures.

One of the main features characterizing biological data is the presence of regularities and repetitions, often associated to common biological functions and/or properties shared by the cellular components of different organisms. That is, biological data is characterized by the presence of “syntactic” features modeling “semantic” properties common to the associated biological components. Such features become interesting when some particular conditions, related to the specific problem under analysis, are satisfied. For example, subsequences repeated more frequently than a fixed threshold in a set of protein sequences may be considered interesting, since this probably denotes that those subsequences embody some specific biological function common to the proteins in the given set.

When the repeated syntactic structures are worth being considered interesting, they are generally called motifs for the biological data in which they repeatedly occur, and it is usually important to look for them and single them out.

The problem of motif discovery has been thoroughly addressed during the last few years, specifically in the context of analyzing string-shaped data, and many important results have been obtained (see, e.g., (Brazma et al. 1998; Rombo et al., 2007) for surveys on this topic). Motif extraction from complex biological structures is relevant from the application viewpoint as well (Apostolico et al., 2008b; Apostolico et al. 2008; Cheng et al. 2008), as further discussed below. Indeed, following the huge efforts paid towards completing the genome coding of several organisms, a large deal of attention is now turning towards the analysis of how cellular components interact in order for biological functions to be realized (von Mering et al., 2002; Miller et al., 2005). Interactions among cellular components are most often modeled by biological networks (Alm et al., 2002), where the nodes represent components (e.g., proteins, genes, enzymes) and edges among nodes denote interactions amongst corresponding components. Different types of biological networks are currently available for analysis purposes, such as protein-protein interaction networks, gene regulatory networks and metabolic networks. Also, several studies (Cheng et al., 2008; Mangan et al., 2005; Milo et al., 2002; Shen-Orr et al., 2002) lead to the important observation that, similarly as with sequences, biological networks can be often understood in terms of basic repeated building blocks that are network motifs. And, for this reason, several techniques have been recently devised to single out motifs of biological networks. The aim of this paper is precisely that of providing a review on the main approaches to network motif extraction presented in the literature.

Preliminarily, it is worth noting that, technically speaking, the problem at hand presents several intrinsic difficulties such as, for instance, the need to deal with (variants of) the sub-graph isomorphism problem (a well known NP-hard problem (Garey et al., 1979)) when searching for identical or quasi-identical sub-graphs within several input graphs. In this respect, it will be interesting to point out how such difficulties have been dealt with in the context of the various proposed techniques. Also, different kinds of network motifs may be defined, referring to both network topology and node similarities, which is conducive to different approaches to motif extraction problems. This issue will also be highlighted in the following.

Complete Chapter List

Search this Book:
Reset