This chapter provides an overview of the computational approaches developed for exploring the modular organization of protein interaction networks. A special emphasis is placed on the module finding tools implemented in three freely available software packages, VisANT, Cytoscape and MATISSE, as well as on their biomedical applications. The selected methods are presented in the broader context of module discovery options, ranging from approaches that rely merely on topological properties of the underlying network to those that take into account also other complementary data sources, such as the mRNA levels of the proteins. The author will also highlight some current limitations in the measured network data that should be understood when developing and applying module finding methodology, and discuss some key future trends and promising research directions with potential implications for clinical research.
Recent advances in experimental technologies and computational methods have made it possible to measure and predict protein-protein interactions on a global scale (see Chapters I-VI and Shoemaker & Panchenko, 2007a, b). While the large-scale interaction datasets can provide an unprecedented glimpse into the cellular mechanisms underlying the behavior of various biological systems, the increasing sizes and densities of the protein interaction networks available today pose also many challenging computational problems. In particular, the inherent complexity of most biological processes and the large number of possible interactions involved can make it difficult to interpret and mine the interaction networks only by eye, even with the help of sophisticated visualization and layout tools available. Software packages that implement computational tools for more advanced network data mining can facilitate the explorative network analysis by identifying the key players and their interactions that contribute to the cellular processes of interest. This may allow e.g. to pinpoint errors in experimentally or computationally derived interaction links, identify proteins directly involved in the particular process, and to formulate hypotheses for follow-up experiments. It should be realized, however, that even if the rapidly developing complex network theory has successfully been applied to analysis of various social and technological networks, such as the Internet and computer chips, its impact on studying molecular interaction networks is still an emerging area of research and therefore all these methods should be considered as experimental. At the moment, the computational tools are best used together with network visualization and analysis software that enable interactive and fully-controlled mining of the complex protein interaction networks.
One of the most fundamental properties found in many biological networks is their modular organization (Hartwell et al., 1999). Consequently, the decomposition of large networks into a hierarchy of possible overlapping sub-networks (so-called modules) has become a principal analytical approach to deal with the complexity of large cellular networks (Barabási & Oltvai, 2004). In protein interaction networks, a functional module refers to a group of physically connected proteins that work together to carry out a specific cellular function in a particular spatio-temporal context. A large number of computational tools of increasing complexity have recently been developed for investigating the modular organization of interaction networks. These tools cannot only identify whether a given network is modular or not, but also detect the modules and their inter-relationships in the underlying network. By relating the found sub-networks with complementary functional genomics or proteomics data, such as gene expression profiles from genome-wide microarray experiments or protein abundance measurements from mass-spectrometry-based assays, it is also possible to identify a hierarchy of connected groups of components that show coherent expression patterns. Such functionally organized modules cannot only emphasize the biological meaning of the modules discovered, but also allow us to gradually focus on the active subsystems of particular interest (active modules), which can lead to concrete hypotheses about the regulatory mechanisms and pathways most important for the given process (Ideker et al., 2002). Moreover, these modules can subsequently be used in predictive modeling studies, with the aim of suggesting new biological hypotheses, such as unexplored new interactions or the function of individual components, or even distinguishing different biomedical phenotypes (discriminative modules).