Improved Feature Selection by Incorporating Gene Similarity into the LASSO

Improved Feature Selection by Incorporating Gene Similarity into the LASSO

Christopher E. Gillies (Department of Computer Science and Engineering, Oakland University, Rochester, MI, USA), Xiaoli Gao (Department of Mathematics and Statistics, Oakland University, Rochester, MI, USA), Nilesh V. Patel (Department of Computer Science and Engineering, Oakland University, Rochester, MI, USA), Mohammad-Reza Siadat (Department of Computer Science and Engineering, Oakland University, Rochester, MI, USA) and George D. Wilson (Radiation Oncology Department and BioBank Department Beaumont Health System, Royal Oak, MI, USA)
Copyright: © 2012 |Pages: 22
DOI: 10.4018/jkdb.2012010101
OnDemand PDF Download:
$37.50

Abstract

Personalized medicine is customizing treatments to a patient’s genetic profile and has the potential to revolutionize medical practice. An important process used in personalized medicine is gene expression profiling. Analyzing gene expression profiles is difficult, because there are usually few patients and thousands of genes, leading to the curse of dimensionality. To combat this problem, researchers suggest using prior knowledge to enhance feature selection for supervised learning algorithms. The authors propose an enhancement to the LASSO, a shrinkage and selection technique that induces parameter sparsity by penalizing a model’s objective function. Their enhancement gives preference to the selection of genes that are involved in similar biological processes. The authors’ modified LASSO selects similar genes by penalizing interaction terms between genes. They devise a coordinate descent algorithm to minimize the corresponding objective function. To evaluate their method, the authors created simulation data where they compared their model to the standard LASSO model and an interaction LASSO model. The authors’ model outperformed both the standard and interaction LASSO models in terms of detecting important genes and gene interactions for a reasonable number of training samples. They also demonstrated the performance of their method on a real gene expression data set from lung cancer cell lines.
Article Preview

Introduction

An important goal of the healthcare industry is personalized medicine, which has the potential to transform healthcare practice. One aspect of the personalized medicine focuses on customizing treatments based on the genetic profile of a patient. In order to achieve this goal, researchers are looking for biomarkers, such as the expression of a gene or group of genes that correlate with treatment outcomes. Biologists use microarrays to read the gene expression of a biospecimen, see (Dubitzky, Granzow, Downes, & Berrar, 2009) for an introduction to microarrays. Some groups are using newer techniques such as RNA-seq to read gene expression (Marioni, Mason, Mane, Stephens, & Gilad, 2008). In this paper, we focus on microarrays; however, we believe our method could be applied in a similar manner to RNA-Seq data. The result of a microarray experiment is a gene expression profile. We represent a gene expression profile by , whereis the number of genes and corresponds to the expression of gene, or feature, . A series of microarray experiments yields a matrix where each row represents a single microarray experiment. In this paper, we assume each row of is a biospecimen from a different patient. The gene expression matrix can be used for supervised or unsupervised learning. However, we focus exclusively on supervised learning. In supervised learning, we have a vector where each corresponds to a row of . If is continuous then this is a regression problem, but if represents an element from a set of categories or labels then this is called a classification problem. A representation of these notational concepts can be seen.

(1)

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 7: 2 Issues (2017): 1 Released, 1 Forthcoming
Volume 6: 2 Issues (2016)
Volume 5: 2 Issues (2015)
Volume 4: 2 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing