Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

GA-Based Data Mining Applied to Genetic Data for the Diagnosis of Complex Diseases

Vanessa Aguiar, Jose A. Seoane, Ana Freire, Ling Guo

Source Title: Soft Computing Methods for Practical Environment Solutions: Techniques and Studies

DOI: 10.4018/978-1-61520-893-7.ch014

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

A new algorithm is presented for finding genotype-phenotype association rules from data related to complex diseases. The algorithm was based on genetic algorithms, a technique of evolutionary computation. The algorithm was compared to several traditional data mining techniques and it was proved that it obtained better classification scores and found more rules from the data generated artificially. It also obtained similar results when using some UCI Machine Learning datasets. In this chapter it is assumed that several groups of Single Nucleotide Polymorphisms (SNPs) have an impact on the predisposition to develop a complex disease like schizophrenia. It is expected to validate this in a short period of time on real data.

Chapter Preview

Top

Introduction

Complex diseases are those that result from the interaction of multiple factors, usually including both genetic and environmental factors (Risch, 2000). Due to their nature, it is hard to establish a relationship between a gene and the disease. In general, this type of disease is caused by combination of effects of several sets of Single Nucleotide Polymorphisms (SNPs) which, separately, have a low effect. There is a high prevalence and impact of complex diseases like cancer, mental disorders and cardiovascular diseases. This situation has a high repercussion on the costs of hospitals and, therefore, on the costs of the national health system.

A SNP (Den Dunnen & Antonarakis, 2000) is a single nucleotide site where two (of four) different nucleotides occur in a high percentage of the population, that is, at least in 1% of the population. Since there exist 14 million of SNPs in human beings then a huge amount of data obtained from DNA genotyping needs to be dealt with, thus many variables have to be taken into account.

This data can be analysed carrying out association studies. In a genetic association study, the frequency of a SNP variant in people affected by the same disease is compared to the frequency of a SNP variant in healthy people (control population). There has to be no familiar relationship between these subjects, they have to belong to the same ethnic group and have the same geographic origin.

Carrying out such studies is expensive, mostly due to the cost of genotyping. Genotyping is the process of determining the genotype of an individual using a biological test. In Spain, for example, the cost of genotyping 74 SNPs for 720 samples reaches nearly 8.000€. The accuracy rate of the technologies used for this purpose ranges between 85-98%, depending on which one has been chosen. The technology used is chosen depending on the approach and purpose of the study and the number of SNPs to be genotyped. Not having an accuracy rate of 100% will make the analysis of genetic data more difficult as there will be missing data.

An important challenge that molecular association study faces in the post genomic era is to understand the inter-connections between networks of genes and their products. These networks are initiated and regulated by a variety of environmental changes. The variety of genotype definitions leads to an increase of the number of tests that need to be run and also involves a large amount of comparisons. Non-reproducibility of many results obtained in several studies has led to criticism of association studies.

SNP data and haplotypes used in association studies of complex diseases have three main characteristics which represent important challenges in data analysis. These characteristics are: complexity, heterogeneity and a constantly evolving nature. In addition to this, this type of data is large, redundant, diverse and distributed.

It is heterogeneous in the sense that it involves a large amount of data types, including categorical and continuous data, sequences, as well as temporal data, incomplete and missing data. There is a lot of redundancy in SNP and haplotype databases. This type of data is very dynamic and evolves continuously. Not only the data but also the schema evolves, which means that it requires special knowledge when designing modelling techniques. Finally, SNP and haplotype data is complex and has intrinsic features and subtle patterns, in the sense that it is very rich in associated complex phenotype traits or common multifactor diseases.

In complex diseases, in general, the combination of certain genes predisposes to develop a disease and the environmental factors are those which increase the impact of these genes in the disease development. This is known as epistasis or epistatic effect. In addition, environmental factors, which at the population level seem to have only a moderate impact, might have higher risks in subpopulations with certain genetic predispositions. There are major methodological challenges in the study of gene-gene and gene-environment interactions. Another important challenge is to study large datasets in order to identify combinations of SNPs which interact increasing the predisposition to develop a certain complex disease. Thus, there is a need to develop methods capable of performing a massive analysis of SNP data related to complex diseases beyond that of traditional statistical approaches.

Hence, the objective of this chapter is to develop an algorithm that will analyse data obtained from genotyping as part of an association study. This will help reduce the costs of this type of study. The chapter has the following structure:

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

GA-Based Data Mining Applied to Genetic Data for the Diagnosis of Complex Diseases

Abstract

Introduction

Complete Chapter List