Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Penguin Search Optimisation Algorithm for Finding Optimal Spaced Seeds

Youcef Gheraibia, Abdelouahab Moussaoui, Youcef Djenouri, Sohag Kabir, Peng-Yeng Yin, Smaine Mazouzi

Source Title: International Journal of Software Science and Computational Intelligence (IJSSCI) 7(2)

DOI: 10.4018/IJSSCI.2015040105

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

This paper develops PeSeeD, a new metaheuristic algorithm for finding optimal spaced seed. Sequences matching is a hot topic in bio-informatics, which is used in many applications such as understanding the functional, structural, or evolutionary relationships between the sequences. The most relevant sequences matching methods are based on seeds designed to match two biological sequences. The first approach which introduced seeds was facilitated via Blastn tool, the approach builds seeds of 11 length size. However, it is clear that not all local alignments have to include an identical fragment of length 11. The spaced seeds approach is one of the methods which does not require a consecutive matching position. Dynamic programming is used to solve this kind of problem and it takes quadratic time. Several approaches have then been proposed to improve the sensitivity of searching in reasonable runtime. To reduce the complexity of such approaches, other heuristics based approaches can also be reviewed. The aim is to find spaced seeds subset which improves sensitivity without increasing the computation time. In this paper, the optimal subset spaced seeds are explored using the bio-inspired approach, penguins search optimisation algorithm (‘'PeSOA'' for short). The authors further propose an efficient heuristic for computing the overlap complexity between seeds. To evaluate the efficiency of the proposed approach, they compared the obtained results with the results of several seeds based software tools. The obtained results are very promising in terms of sensitivity and computation time for the overlap complexity.

Article Preview

Top

1. Introduction

Biological instances contain a large biological information with different types of data (sequence, structure...). These latter are either proteins (Uniprot) or nucleic sequence (EMBL, GenBank...). Thus, realising similarity search among these instances in a reasonable time is a difficult task. Undoubtedly, developing an efficient algorithm based on spaced seeds is a big challenge for bio-informatics community. The use of spaced seeds performs specialised optimisation for next generation sequencing. Several methods for solving this problem have been recently proposed and can be classified in two categories. The first one employs dynamic programming and can find an exact solution with quadratic time complexity (Smith et al., 1981). As biological databases grow larger, this exact approach is usually required a high time consuming. The Second category is the heuristic algorithms (Lipman et al., 1985) which can achieve good solutions in a reasonable time. The heuristic methods are based on approximate string matching.

The alignment of two biological sequences based on seeds is an efficient technique used by several algorithms in order to produce other sequencing data generations. Among these algorithms, the 11 consecutive matches of BLAST (Altschul et al., 1990) is called a contiguous seed, denoted as 11111111111 for eleven consecutive matches. It is required to find an identical stretch of length 11 which is not always feasible. In order to increase the probability to find an alignment, PatternHunter II (Li et al., 2004) uses one or several non-contiguous seeds called spaced seeds. Concretely, each spaced seed S is a vector of n elements where n is the length of each seed and their position is defined as follows: S[i] = 1 if the position need required matching and S[i] = 0 if we do not care about position matching, the number of ones in the seed called the weight of the seed. This sensitivity is used to evaluate the quality of a spaced seed for matches alignment. The objective of spaced seed is to increase sensitivity without reducing the computation time performance. The sensitivity is approximated by matching the spaced seeds and a Bernoulli representation of the alignment.

Homology search in biological sequences is a very important task for discovering and understanding similarities among genes and proteins, in order to find similar segments, or local alignments, between two DNA or protein sequences (Altschul et al., 1990). The sizes of DNA and protein databases become very large, such as the EMBL Nucleotide Sequence Database (EMBL-Bank) has increased in size from around 600 entries in 1982 to over 6.2×108 by MARCH 2015, so homology search is very time consuming and far to be done in reasonable time (Altschul et al., 1990).

Any optimisation problem such as finding optimal spaced seeds have two conflicting factors, computation time (searching speed) and solution quality (sensitivity). The aim of all previous methods is to design in reasonable time a good set of spaced seeds having high sensitivity. So there is a tradeoff between the computation process and the sensitivity (Choi et al., 2004). Indeed, we can increase the sensitivity by decreasing the required weight of the hit, nevertheless, the decreasing of the weight of hits will increase the runtime and also increase the number of fallacious hits.

The Penguins search optimisation algorithm (Gheraibia et al., 2013) is a meta-heuristic based on hunting behaviour of penguins used for solving complex problems. The hunting strategy of penguins is more than fascinating since they can collaborate their efforts and synchronise their dives to optimise the global energy in the process of collective hunting.

Complete Article List

Search this Journal:

Reset

Volume 16: 1 Issue (2024)

Volume 15: 1 Issue (2023)

Volume 14: 4 Issues (2022): 1 Released, 3 Forthcoming

Volume 13: 4 Issues (2021)

Volume 12: 4 Issues (2020)

Volume 11: 4 Issues (2019)

Volume 10: 4 Issues (2018)

Volume 9: 4 Issues (2017)

Volume 8: 4 Issues (2016)

Volume 7: 4 Issues (2015)

Volume 6: 4 Issues (2014)

Volume 5: 4 Issues (2013)

Volume 4: 4 Issues (2012)

Volume 3: 4 Issues (2011)

Volume 2: 4 Issues (2010)

Volume 1: 4 Issues (2009)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Penguin Search Optimisation Algorithm for Finding Optimal Spaced Seeds

Abstract

1. Introduction

Complete Article List