Chaos Game Representation of Mitochondrial Genomes: Markov Chain Model Simulation and Vertebrate Phylogeny

Chaos Game Representation of Mitochondrial Genomes: Markov Chain Model Simulation and Vertebrate Phylogeny

Zu-Guo Yu, Guo-Sheng Han, Bo Li, Vo Anh, Yi-Quan Li
DOI: 10.4018/978-1-60960-064-8.ch003
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The mitochondrial genomes have provided much information on the evolution of this organelle and have been used for phylogenetic reconstruction by various methods with or without sequence alignment. In this paper, we explore the mitochondrial genomes by means of the chaos game representation (CGR), a tool derived from the chaotic dynamical systems theory. If the DNA sequence is a random collection of bases, the CGR will be a uniformly filled square; on the other hand, any pattern visible in the CGR contains information on the DNA sequence. First we use the Markov chain models to simulate the CGR of mitochondrial genomes. Then we model the noise background in the genome sequences by a Markov chain. A simple correlation-related distance approach without sequence alignment based on the CGR of mitochondrial genomes is proposed to analyze the phylogeny of 64 selected vertebrates.
Chapter Preview
Top

Introduction

The availability of long genomic sequences opens a new field of research devoted to the analysis of their structure. Singular short-word frequencies in the genome sequences have been reported for various species and shown to be species-specific (Deschavanne et al., 1999). It was shown that the dinucleotide relative abundance values vary less within a genome than among species and that closely related organisms display more similar dinucleotide composition than do distant organisms (Karlin et al., 1997).

As a form of fractal images, the chaos game representation (CGR) of a DNA sequence originally proposed by Jeffrey (1990) offers a handy approach for dealing with such large amount of data. Goldman (1993) used CGRs to explain the observed patterns by calculating the dinucleotide and trinucleotide frequencies and proposed two Markov Chain models to simulate the CGRs of long DNA sequences. Deschavanne et al. (1999) detailed genomic comparisons involving parts of the genome or the whole genome, and some constructions of molecular phylogenies based on the CGR. Later on the CGR technique was used to compare genomes by Almeida et al. (2001) and Joseph & Sasikumar (2006). The idea of CGR of DNA sequences proposed by Jeffrey (1990) was generalized and applied for visualizing and analyzing protein sequences and structures (Fiser et al., 1994; Basu et al.,. 1998; Yu et al., 2004; Yang et al., 2009). Yu et al. (2008) proposed an iterated function system to simulate the CGR of linked protein sequences of prokaryote genomes.

When complete genomes are considered, mitochondrial DNA has been proved to be a powerful tool for phylogenetic reconstruction (Reyes et al., 1998). Mitochondrial genes and genomes have long been a major focus in molecular evolution, and these genomes are excellent candidates for demonstrating the power of evolutionary genomics. They have the advantage that they are present in high concentrations in many tissues, reliably amplified by PCR, and can easily be enriched by purification of the mitochondria prior to DNA extraction (e.g., Dowling et al., 1996). Mitochondrial genomes also have a strong advantage over nuclear genes in that they are unlikely to experience many intraspecific recombination events (Pollack et al., 2000). The mitochondrial gene order breakpoints were used to discuss early eukaryote evolution (Sankoff et al., 2000). Due to the problems caused by the uncertainty in alignment (Wong et al. 2008), existing tools for phylogenetic analysis based on multiple alignment could not be directly applied to the whole-genome comparison and phylogenomic studies. There has been a growing interest in alignment-free methods for phylogenetic analysis using complete genome data (Wu et al. 2009).

In the present study, we study the CGR of mitochondrial genomes. First we use the Markov Chain models proposed by Goldman (1993) to simulate these CGRs. Then a simple correlation approach without sequence alignment based on the CGR of mitochondrial genomes is proposed to analyze the phylogeny of 64 selected vertebrates.

Complete Chapter List

Search this Book:
Reset