Building Gene Networks by Analyzing Gene Expression Profiles

Crescenzio Gallo

Source Title: Encyclopedia of Information Science and Technology, Fourth Edition

DOI: 10.4018/978-1-5225-2255-3.ch039

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

The possible applications of modeling and simulation in the field of bioinformatics are very extensive, ranging from understanding basic metabolic paths to exploring genetic variability. Experimental results carried out with DNA microarrays allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. In this chapter we examine various methods for analyzing gene expression data, addressing the important topics of (1) selecting the most differentially expressed genes, (2) grouping them by means of their relationships, and (3) classifying samples based on gene expressions.

Chapter Preview

Top

Introduction

Since the detection of the composition of DNA our understanding of biological structures and processes has expanded to a great extent, mostly thanks to computer science which plays a fundamental role in the field of bioinformatics. The main target at present is to analyze and employ the huge amount of accessible data. It is particularly important to distinguish various diseases through useful selection of gene indicators for morbid state and information about the possible correlations between genes.

Data analysis is seen as the largest and possibly the most important area of microarray bioinformatics to obtain the above said targets. Some specific data analysis methods address the fundamental scientific questions about microarray data, that is:

1.
Which genes are differentially expressed in one set of samples relative to another,
2.
What are the associations between the genes or samples being observed, and
3.
Is it possible to group samples based on gene expression values?

In the next section, we illustrate the basic concepts underlying the previous questions and the bioinformatics research. Then we describe the methods for the first of these questions: the search for differentially (up or down) expressed genes. The following sections address the other two topics of clustering and classifying gene profiles. In the end, we show some concerns and issues of interest for future study and development in the field of (microarray) bioinformatics.

Top

Background

Gene expression profiling is an extensively used method in the analysis of microarray data. The leading hypothesis is that genes with similar expression profiles are co-regulated and are probably connected functionally.

Cluster analysis helps reaching these objectives; in particular, gene expression clusters help typify unknown genes assigned to the cluster by those genes that have a known function, and are the support for distinguishing common upstream regulatory sequence elements (Brazma et al., 2000).

Clustering of expression profiles and functional grouping is especially compelling if the complete gene set is known. Hence, we used the large publicly available data set included in the Stanford Yeast Database at http://genome-www.stanford.edu for our clustering study.

Many applications aim at the molecular classification of diseases based on gene expression profiling and clustering. See, for example, works on leukemias (Golub et al., 1999) and B-cell lymphomas (Alizadeh et al., 2000). These and other studies confirm the usefulness of microarray bioinformatics for scientific and industrial research.

Key Terms in this Chapter

Clustering: Clustering or cluster analysis is a set of techniques of multivariate data analysis aimed at selecting and grouping homogeneous elements in a data set. Clustering techniques are based on measures relating to the similarity between the elements. In many approaches this similarity, or better, dissimilarity, is designed in terms of distance in a multidimensional space. Clustering algorithms group items on the basis of their mutual distance, and then the belonging to a set or not depends on how the element under consideration is distant from the collection itself.

Pattern: In biology with pattern (sometimes “profile”) one refers to different types of regularity, such as the regularity of the biological sequences of DNA or proteins that allow the recognition and specific binding between molecules, or the regularity in the level of expression of the genes of cells which allow the recognition of different experimental cell types including tumor cell types, or the regularity in the events that occur during processes such as the development of an organism, or even the regularities in the behavior of animals.

DNA Microarray: A DNA microarray (commonly known as gene chip, DNA chip, biochip array or high density) is a collection of microscopic DNA probes attached to a solid surface such as glass, plastic, or silicon chip forming an array (matrix). Such arrays allow to simultaneously examine the presence of many genes within a DNA sample (which often can also represent the entire genome or transcriptome of an organism). A typical use is to compare the gene expression profile of an individual patient with that of a healthy one to identify which genes are involved in the disease.

Gene Expression: In the field of molecular biology, gene expression profiling is the measure of the activity (expression) of thousands of genes at a time, to create a global picture of cellular function. These profiles can, for example, distinguish between cells that are in proliferation, or show how the cells react to a particular treatment. Many experiments of this type measure an entire genome simultaneously. DNA Microarray technology measures the relative activity of target genes previously identified.

DNA: Deoxyribonucleic acid (DNA) is a nucleic acid that contains the genetic information necessary to the biosynthesis of RNA and protein molecules essential for the development and proper functioning of most living organisms. The order in the sequential arrangement of the nucleotides A, T, C, G represents the genetic information, which is translated with the genetic code in the corresponding amino acids.

Algorithm: An algorithm is a procedure that solves a given problem by a finite number of steps. A problem solved by an algorithm is said computable . The term “algorithm” is derived from the Latin transcription of the name of the Persian mathematician al-Khwarizmi, which is considered one of the first authors to have made reference to this concept.

Gene: The gene is the fundamental hereditary unit of living organisms. Genes correspond to portions of the genetic code localized in specific positions within the sequence (DNA or, more rarely, RNA) and contain all the information necessary for the production of a protein. They are contained and organized within chromosomes, present in all cells of an organism.

Cluster: Natural subgroup of a population, used for statistical sampling or analysis.

Artificial Neural Network: ANNs are mathematical models that represent the interconnection between elements defined artificial neurons, i.e. mathematical constructs that to some extent mimic the properties of living neurons. These mathematical models can be used both to obtain an understanding of biological neural networks, but even more to solve engineering problems of artificial intelligence such as those that arise in various technological fields (in electronics, computer science, simulation, and other disciplines).

Bioinformatics: Bioinformatics is a scientific discipline devoted to the solution of biological problems at the molecular level with computer methods. It is an attempt to describe, in numerical and statistical terms, biological phenomena with a set of analytical and numerical tools. In addition to information technology, bioinformatics uses applied mathematics, statistics, chemistry, biochemistry and concepts of artificial intelligence. Bioinformatics mainly deals with providing valid statistical models for the interpretation of data from experiments in molecular biology and biochemistry in order to identify trends and numerical laws; generate new models and mathematical tools for the analysis of sequences of DNA, RNA and proteins in order to create a body of knowledge concerning the frequency of relevant sequences, their evolution and possible function; and organize the knowledge acquired at the global level of genome and proteome databases in order to make such data accessible to all, and to optimize the data search algorithms to improve accessibility.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference