Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Recognition of Translation Initiation Sites in Arabidopsis Thaliana

Haitham Ashoor, Arturo M. Mora, Karim Awara, Boris R. Jankovic, Rajesh Chowdhary, John A.C. Archer, Vladimir B. Bajic

Source Title: Systemic Approaches in Bioinformatics and Computational Systems Biology: Recent Advances

DOI: 10.4018/978-1-61350-435-2.ch005

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Their results suggest that in spite of the considerable evolutionary distance between Homo sapiensand A. thaliana, our approach successfully recognized deeply conserved genomic signals that characterize TIS. Moreover, they report the highest accuracy of TIS recognition in A. thaliana DNA genomic sequences.

Chapter Preview

Top

Introduction

One of the objectives of bioinformatics is to identify important biological signals in various genomic sequences. The translation initiation site (TIS) is one such signal that denotes the start codon at which translation initiates. Accurate recognition of TIS signals can help in discovery of protein-coding genes and in better annotation of gene loci (Preiss & Hentze, 2003, Do & Choi, 2006). Annotation engines typically assign the TIS to the first ATG codon which generates a maximal Open Reading Frame (ORF), but this by no means is sufficiently accurate.

Canonical TISs consist of the ATG triplet nucleotides, but in rare cases may consist of ACG or CTG triplets. In this study, we focus on the canonical ATG sequences (Preiss & Hentze, 2003). However, an ATG triplet will occur, on average, every 64 nucleotides in random DNA. Thus, in higher eukaryotes with large genomes, there will be a plethora of false TIS signals. For instance, in the 3.3 billion base pairs (bp) human genome with an estimated coding capacity of ~30,000 genes and assuming all are protein coding and with no alternative TISs, there will be ~30,000 real TISs and 103,095,000 false TIS signals, i.e. ~3,436 fold excess of false to true signals. Thus, there is a clear need for accurate prediction of TIS signals contained in the DNA sequence.

The presence of introns within genes, makes the accurate prediction of the TIS signals from genomic DNA sequence much more difficult than from cDNA or mRNA sequences. Extensive research has been carried out to develop computational methods for recognition of TISs mainly in cDNA and mRNA sequences. Perhaps understandably, much less attention has been given to the more difficult problem of identifying computationally these signals within genomic DNA. The associated problem is determination of the best set of features that can be used to discriminate true form false genomic signals (Saeys et al., 2007), in our case TIS signals. In this study, we introduce several new global features to the pool of already studied TIS related features, and we select the set of relevant features using a wrapper method.

Most computational recognition approaches of TIS signals have used mRNA dataset for comparing results (Pedersen and Nielsen, 1997). This dataset contains a mix of mRNA sequences from different vertebrate genomes. They (Pedersen & Nielsen, 1997) implemented an Artificial Neural Network (ANN) to predict TISs and reported an accuracy of 85% on their dataset. Later, (Hatzigeorgiou, 2002) reported an accuracy of 94% on human cDNA sequences that contain complete ORFs. She also employed a combination of two ANNs as a prediction model. Ma and colleagues developed TISKey (Ma et al., 2006), which uses an ensemble of Support Vector Machines (SVMs) and with the Pedersen and Nelsen dataset reported accuracy of 93.7%. Zeng and AlHajused multiple agent architecture with reinforcement learning and reported 96.72% accuracy(Zeng & AlHaj, 2008). Rajapakse and Ho implemented a hybrid approach of Markov model and ANN on the Pedersen and Nielsen dataset (Rajapakse & Ho, 2005) and reported 93.8% sensitivity and 96.9% specificity using 3-folds cross validation. Li et al. used the Hatzigeorgiou dataset of mRNA sequences with full ORFs, and by using a Gaussian mixture model reported sensitivity of 98.06% and specificity of 92.14% (Li et al., 2004).

Studies based on genomic DNA sequences exhibited lower levels of accuracy. Saeyes et al. reported on human genomic DNA sequences 80% sensitivity, and 87.5% specificity (Saeyes et al., 2007). Sparks and Brendel developed the MetWAMer system which uses a perceptron classification algorithm and clustering of data by the k-medoids algorithm and methionine-weight array matrices to achieve an accuracy of 85% on A. thaliana genomic DNA sequences dataset (Sparks & Brendel, 2008). Pertea and Salzberg demonstrated that GlimmerM achieved 84% accuracy on both A. thaliana and human genomic sequences (Pertea & Salzberg, 2002).

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Recognition of Translation Initiation Sites in Arabidopsis Thaliana

Abstract

Introduction

Complete Chapter List