Mapping the Chromosome through a Novel Use of GIS and Spatial Analysis

Mapping the Chromosome through a Novel Use of GIS and Spatial Analysis

Jane L. Garb, D. Joseph Jerry, Mary J. Hagen, Jennifer Friderici
Copyright: © 2015 |Pages: 11
DOI: 10.4018/978-1-4666-5888-2.ch550
(Individual Chapters)
No Current Special Offers

Chapter Preview



This chapter merges three disciplines:molecular biology, Geographic Information Systems (GIS) and spatial statistics. GIS is the hardware and software for storing, managing and visualizing data on geographic location (traditionally referring to location on the earth’s surface)-- at the scale of continents, countries, states, towns, neighborhoods, blocks, etc. The application of GIS and spatial analysis to cells and the chromosomes within their nuclei, which collectively form the genome, represents a vastly different scale of exploration. The genome is distributed across different chromosomes and distinct regions are organized into more than 20,000 genes in mammals.

The expression of genes and their relationship to one another is the target of this application. Genes are linear sequences of base pairs (Adenine, Cytosine, Guanine, Thymine) which compose the genetic code. Genes are functional units that are transcribed into RNA, which provides the template for making proteins. This process is called gene expression, and the amount of RNA synthesized indicates the activity level of a biochemical pathway. Microarray technology measures RNA levels of the 20,000+ genes in the genome at one point in time. Gene expression on a microarray is typically visualized as a “heat map,” with color-coded RNA expression levels (e.g., red to green from high to low). Figure 1 shows a microarray heat map for two experimental groups of mice described later in this chapter. The arrangement of genes on the microarray does not correspond to their location on chromosomes. However, now that the exact location along the chromosome is known for most genes for humans and many other species (Chou, et al., 2004; Hanin, et al., 2009; Ishii, et al., 2000; Jurata, et al., 2004; Van de Wiel, et al., 2005) we may visualize gene expression spatially with technologies like GIS. Spatial analysis can provide insight into how each gene interacts with its neighbors. This is important to understanding how expression of more than 20,000 genes is coordinated to allow the development of dramatically different tissues, such as muscle and brain.

Figure 1.

Heat map of gene expression. A total of 279 genes were differentially expressed in mammary tissues from nulliparous mice compared to parous mice. Each row represents a unique gene. The heat map shows the levels of mRNA expression for each mouse relative to the mean value of the nulliparous group. Green indicates decreased levels and red increased levels using a log2 scale where -1 represents a 2-fold decrease and +1 represents a 2-fold increase relative to the mean of the nulliparous group.


The aim of this chapter is to provide proof-of-concept for a novel application of spatial statistics and GIS to enhance our understanding of mechanisms underlying gene regulation. The methodological advantages and shortcomings in the analysis of gene expression by previous methods are contrasted with those of our proposed method.



Various methods have been used to analyze gene expression patterns in microarray data. Hahn (Hahn, 2006) used time series to look for coordination of gene expression (implying that genes act together) in the Drosophila (fruit fly). He treated the chromosome as one-dimensional space with data measured at equally-spaced intervals along its length, rather than at more realistic irregularly-spaced intervals. Along with others (De Iorio & Verzilli, 2007; Guanghua, et al., 2013), he considered spatial autocorrelation (clustering) a statistical nuisance to be corrected, rather than a phenomenon to be studied.

Key Terms in this Chapter

Gene Expression: Gene expression is the process of translating DNA into RNA which then manufactures a protein. The synthesis of RNA is the “expression” of the gene. The amount of RNA created can be measured and is known as the gene expression level.

Gene: A specific sequence of nucleotides in DNA or RNA, located on the chromosome, which holds information necessary to build and maintain an organism’s cells, and to pass genetic traits to offspring.

Genome: The genome is the entire set of all the genes and all the chromosomes of an organism, i.e., all the hereditary genetic information encoded in an organism. The exact configuration of this information may differ from one member of a species to the next, but the type of information contained (a collection of DNA sequences) is the same.

Transcription Factor: A protein that binds to DNA sequences in the gene and either enhances or represses gene expression.

Ribonucleic Acid (RNA): Like DNA, RNA is a nucleic acid comprised of four chemical bases, three of which are identical to DNA –Adenine(A), Guanine (G), Cytosine (C)—and the fourth Uracil(U) instead of Thymine (T). RNA is composed of single chains of four chemical bases, created when an enzyme called RNA polymerase uses DNA as a template to produce RNA in a process called transcription.

Deoxyribonucleic Acid (DNA): The building block of the gene. It consists of nucleotides, combinations of four molecules linearly arranged in varying order: cytosine, guanine, adenine and thymine. Their order serves as code for the building and maintenance of living cells. DNA is arranged in a double strand called a double-helix, in which the nucleotides on one strand are the complement of those on the other ---adenine pairs with thymine, cytosine with guanine. These pairs are arranged with bases, sugars and phosphates to form a double helix which can be subsequently unbound to serve as a pattern for reproducing the code.

Geocoding: The process of converting tabular data on geographic location (e.g., street address) into map features.

Scale (Geographic): The level at which location is visualized or analyzed. It is the resolution at which a map is depicted and is measured as a ratio of distance on the map to distance on the ground. For example, a map on a scale of 1:100,000 means that “1” on the map represents 100,000 feet on the ground. The larger the scale of the map, the more detail on the ground can be seen. Large-scale maps cover less area but show greater detail (think of “zooming in,” where objects become larger and more detail is seen).

Microarray: A collection of thousands of microscopic DNA fragments, called probes, which are attached to a solid surface (microchip). Under certain conditions mRNA molecules will bind specifically to these fragments. The amount of mRNA bound to each probe provides a proxy measure for each gene’s expression level in a cell.

Chromosome: A collection of DNA sequences arranged in functional units referred to as genes which are separated by non-coding sequences.

Geographic Information Systems (GIS): The set of hardware and software for the capturing, input, storage, organization, visualization and analysis of spatial data, i.e., any data which can be linked to a spatial location.

Molecular Biology: The study of the structure and function of the cell at its most fundamental molecular chemical level. It is concerned with the mechanisms of RNA, DNA and protein synthesis, and the regulation of cell activity.

Complete Chapter List

Search this Book: