Genetic factors play a major role in the etiology of many human diseases. Genome-wide experimental methods produce an increasing number of genes associated with such diseases. This chapter introduces data sources, bioinformatics tools, and computational methods for prioritizing disease candidate genes and identifying disease pathways. The main strategy is to examine the similarity among the candidate genes and known disease genes at the functional level. The authors review different similarity measures and prevailing methods for integrating results from different functional aspects. They hope this chapter will help advocate many useful resources that the researchers can use to investigate diseases of their interest.
Genetic factors play a major role in the etiology of many diseases, including cancers and neurological disorders. Identifying genes that confer increased risk to the disease, and elucidating cellular and molecular processes in which these genes participate are very important problems in biomedical research. Genome-wide experimental methods, such as linkage, association (Botstein & Risch, 2003) and recently copy number variation (CNV) studies (McCarroll & Altshuler, 2007; Sebat, 2007), are all aimed at narrowing down genomic regions containing candidate disease genes. However, due to the linkage disequilibrium and the limited resolution of genome-wide technologies, the disease-associated regions could contain hundreds of candidate genes. The list of genes produced from such studies is constantly growing. The traditional one-gene-at-a-time approach is a time-consuming and expensive step to validate the disease-causing genes using experimental methods. Therefore it is of great importance and also a challenging task to use computational methods to prioritize disease gene candidates. Computational methods could greatly speed up the efforts directed towards elucidating disease mechanisms and ultimately translating genetic findings into effective prevention, diagnosis and treatment.
The recent availability of a large variety of genomic data and modern high-throughput technologies provide unique opportunities and complementary powerful resources for this purpose. Although disease-gene relationships are not simple (such as different diseases may be caused by mutations in the same gene, and the same disease may be caused by mutations in different genes), disease genes usually share at least some common characteristics including sequence features, expression patterns, involvement in the same protein-protein interaction sub-network, common gene ontology annotations, shared pathways and others (Goh et al., 2007; Oti & Brunner, 2007). For example, it was shown that genes involved in the same disease share up to 80% of their annotations in the GO and InterPro databases (Mulder et al., 2007). The similarity among disease genes is not restricted to the sequences and annotations; the similarity in their functions could also be noted. This leads to the main strategy in prioritizing disease genes, that is, to examine the similarity among candidate genes and known disease genes at the functional level (Han, 2008).