Gene Expression Mining Guided by Background Knowledge

Gene Expression Mining Guided by Background Knowledge

Jirí Kléma (Czech Technical University in Prague, Czech Republic), Filip Železný (Czech Technical University in Prague, Czech Republic), Igor Trajkovski (Jožef Stefan Institute, Slovenia), Filip Karel (Czech Technical University in Prague, Czech Republic) and Bruno Crémilleux (Université de Caen, France)
DOI: 10.4018/978-1-60566-218-3.ch013
OnDemand PDF Download:


This chapter points out the role of genomic background knowledge in gene expression data mining. The authors demonstrate its application in several tasks such as relational descriptive analysis, constraintbased knowledge discovery, feature selection and construction or quantitative association rule mining. The chapter also accentuates diversity of background knowledge. In genomics, it can be stored in formats such as free texts, ontologies, pathways, links among biological entities, and many others. The authors hope that understanding of automated integration of heterogeneous data sources helps researchers to reach compact and transparent as well as biologically valid and plausible results of their gene-expression data analysis.
Chapter Preview


High-throughput technologies like microrarrays or SAGE are at the center of a revolution in biotechnology, allowing researchers to simultaneously monitor the expression of tens of thousands of genes. However, gene-expression data analysis represents a difficult task as the data usually show an inconveniently low ratio of samples (biological situations) against variables (genes). Datasets are often noisy and they contain a great part of variables irrelevant in the context under consideration. Independent of the platform and the analysis methods used, the result of a gene-expression experiment should be driven, annotated or at least verified against genomic background knowledge (BK).

As an example, let us consider a list of genes found to be differentially expressed in different types of tissues. A common challenge faced by the researchers is to translate such gene lists into a better understanding of the underlying biological phenomena. Manual or semi-automated analysis of large-scale biological data sets typically requires biological experts with vast knowledge of many genes, to decipher the known biology accounting for genes with correlated experimental patterns. The goal is to identify the relevant “functions”, or the global cellular activities, at work in the experiment. Experts routinely scan gene expression clusters to see if any of the clusters are explained by a known biological function. Efficient interpretation of this data is challenging because the number and diversity of genes exceed the ability of any single researcher to track the complex relationships hidden in the data sets. However, much of the information relevant to the data is contained in publicly available gene ontologies and annotations. Including this additional data as a direct knowledge source for any algorithmic strategy may greatly facilitate the analysis.

This chapter gives a summary of our recent experience in mining of transcriptomic data. The chapter accentuates the potential of genomic background knowledge stored in various formats such as free texts, ontologies, pathways, links among biological entities, etc. It shows the ways in which heterogeneous background knowledge can be preprocessed and subsequently applied to improve various learning and data mining techniques. In particular, the chapter demonstrates an application of background knowledge in the following tasks:

  • Relational descriptive analysis

  • Constraint-based knowledge discovery

  • Feature selection and construction (and its impact on classification accuracy)

  • Quantitative association rule mining

The chapter starts with an overview of genomic datasets and accompanying background knowledge analyzed in the text. Section on relational descriptive analysis presents a method to identify groups of differentially expressed genes that have functional similarity in background knowledge. Section on genomic classification focuses on methods helping to increase accuracy and understandability of classifiers by incorporation of background knowledge into the learning process. Section on constraint-based knowledge discovery presents and discusses several background knowledge representations enabling effective mining of meaningful over-expression patterns representing intrinsic associations among genes and biological situations. Section on association rule mining briefly introduces a quantitative algorithm suitable for real-valued expression data and demonstrates utilization of background knowledge for pruning of its output ruleset. Conclusion summarizes the chapter content and gives our future plans in further integration of the presented techniques.


Gene-Expression Datasets And Background Knowledge

The following paragraphs give a brief overview of information resources used in the chapter. The primary role of background knowledge is to functionally describe individual genes and to quantify their similarity.

Complete Chapter List

Search this Book:
Editorial Advisory Board
Table of Contents
Riccardo Bellazzi
Petr Berka, Jan Rauch, Djamel Abdelkader Zighed
Petr Berka, Jan Rauch, Djamel Abdelkader Zighed
Chapter 1
Jana Zvárová, Arnošt Veselý
This chapter introduces the basic concepts of medical informatics: data, information, and knowledge. Data are classified into various types and... Sample PDF
Data, Information and Knowledge
Chapter 2
Michel Simonet, Radja Messai, Gayo Diallo
Health data and knowledge had been structured through medical classifications and taxonomies long before ontologies had acquired their pivot status... Sample PDF
Ontologies in the Health Field
Chapter 3
Alberto Freitas, Pavel Brazdil, Altamiro Costa-Pereira
This chapter introduces cost-sensitive learning and its importance in medicine. Health managers and clinicians often need models that try to... Sample PDF
Cost-Sensitive Learning in Medicine
Chapter 4
Arnošt Veselý
This chapter deals with applications of artificial neural networks in classification and regression problems. Based on theoretical analysis it... Sample PDF
Classification and Prediction with Neural Networks
Chapter 5
Patrik Eklund, Lena Kallin Westin
Classification networks, consisting of preprocessing layers combined with well-known classification networks, are well suited for medical data... Sample PDF
Preprocessing Perceptrons and Multivariate Decision Limits
Chapter 6
Xiu Ying Wang, Dagan Feng
The rapid advance and innovation in medical imaging techniques offer significant improvement in healthcare services, as well as provide new... Sample PDF
Image Registration for Biomedical Information Integration
Chapter 7
ECG Processing  (pages 137-160)
Lenka Lhotská, Václav Chudácek, Michal Huptych
This chapter describes methods for preprocessing, analysis, feature extraction, visualization, and classification of electrocardiogram (ECG)... Sample PDF
ECG Processing
Chapter 8
EEG Data Mining Using PCA  (pages 161-180)
Lenka Lhotská, Vladimír Krajca, Jitka Mohylová, Svojmil Petránek, Václav Gerla
This chapter deals with the application of principal components analysis (PCA) to the field of data mining in electroencephalogram (EEG) processing.... Sample PDF
EEG Data Mining Using PCA
Chapter 9
Darryl N. Davis, Thuy T.T. Nguyen
Risk prediction models are of great interest to clinicians. They offer an explicit and repeatable means to aide the selection, from a general... Sample PDF
Generating and Verifying Risk Prediction Models using Data Mining
Chapter 10
Vangelis Karkaletsis, Konstantinos Stamatakis, Karampiperis, Karampiperis, Pythagoras Karampiperis, Pythagoras Karampiperis
The World Wide Web is an important channel of information exchange in many domains, including the medical one. The ever increasing amount of freely... Sample PDF
Management of Medical Website Quality Labels via Web Mining
Chapter 11
Rainer Schmidt
In medicine, a lot of exceptions usually occur. In medical practice and in knowledge-based systems, it is necessary to consider them and to deal... Sample PDF
Two Case-Based Systems for Explaining Exceptions in Medicine
Chapter 12
Bruno Crémilleux, Arnaud Soulet, Jiri Kléma, Céline Hébert, Olivier Gandrillon
The discovery of biologically interpretable knowledge from gene expression data is a crucial issue. Current gene data analysis is often based on... Sample PDF
Discovering Knowledge from Local Patterns in SAGE Data
Chapter 13
Jirí Kléma, Filip Železný, Igor Trajkovski, Filip Karel, Bruno Crémilleux
This chapter points out the role of genomic background knowledge in gene expression data mining. The authors demonstrate its application in several... Sample PDF
Gene Expression Mining Guided by Background Knowledge
Chapter 14
Pamela L. Thompson, Xin Zhang, Wenxin Jiang, Zbigniew W. Ras, Pawel Jastreboff
This chapter describes the process used to mine a database containing data, related to patient visits during Tinnitus Retraining Therapy. The... Sample PDF
Mining Tinnitus Database for Knowledge
Chapter 15
Dinora A. Morales, Endika Bengoetxea, Pedro Larrañaga
Infertility is currently considered an important social problem that has been subject to special interest by medical doctors and biologists. Due to... Sample PDF
Gaussian-Stacking Multiclassifiers for Human Embryo Selection
Chapter 16
Mining Tuberculosis Data  (pages 332-349)
Marisa A. Sánchez, Sonia Uremovich, Pablo Acrogliano
This chapter reviews the current policies of tuberculosis control programs for the diagnosis of tuberculosis. The international standard for... Sample PDF
Mining Tuberculosis Data
Chapter 17
Mila Kwiatkowska, M. Stella Atkins, Les Matthews, Najib T. Ayas, C. Frank Ryan
This chapter describes how to integrate medical knowledge with purely inductive (data-driven) methods for the creation of clinical prediction rules.... Sample PDF
Knowledge-Based Induction of Clinical Prediction Rules
Chapter 18
Petr Berka, Jan Rauch, Marie Tomecková
The aim of this chapter is to describe goals, current results, and further plans of long-time activity concerning application of data mining and... Sample PDF
Data Mining in Atherosclerosis Risk Factor Data
About the Contributors