Improved Feature Selection by Incorporating Gene Similarity into the LASSO

Improved Feature Selection by Incorporating Gene Similarity into the LASSO

Christopher E. Gillies, Xiaoli Gao, Nilesh V. Patel, Mohammad-Reza Siadat, George D. Wilson
Copyright: © 2012 |Pages: 22
DOI: 10.4018/jkdb.2012010101
(Individual Articles)
No Current Special Offers


Personalized medicine is customizing treatments to a patient’s genetic profile and has the potential to revolutionize medical practice. An important process used in personalized medicine is gene expression profiling. Analyzing gene expression profiles is difficult, because there are usually few patients and thousands of genes, leading to the curse of dimensionality. To combat this problem, researchers suggest using prior knowledge to enhance feature selection for supervised learning algorithms. The authors propose an enhancement to the LASSO, a shrinkage and selection technique that induces parameter sparsity by penalizing a model’s objective function. Their enhancement gives preference to the selection of genes that are involved in similar biological processes. The authors’ modified LASSO selects similar genes by penalizing interaction terms between genes. They devise a coordinate descent algorithm to minimize the corresponding objective function. To evaluate their method, the authors created simulation data where they compared their model to the standard LASSO model and an interaction LASSO model. The authors’ model outperformed both the standard and interaction LASSO models in terms of detecting important genes and gene interactions for a reasonable number of training samples. They also demonstrated the performance of their method on a real gene expression data set from lung cancer cell lines.
Article Preview


An important goal of the healthcare industry is personalized medicine, which has the potential to transform healthcare practice. One aspect of the personalized medicine focuses on customizing treatments based on the genetic profile of a patient. In order to achieve this goal, researchers are looking for biomarkers, such as the expression of a gene or group of genes that correlate with treatment outcomes. Biologists use microarrays to read the gene expression of a biospecimen, see (Dubitzky, Granzow, Downes, & Berrar, 2009) for an introduction to microarrays. Some groups are using newer techniques such as RNA-seq to read gene expression (Marioni, Mason, Mane, Stephens, & Gilad, 2008). In this paper, we focus on microarrays; however, we believe our method could be applied in a similar manner to RNA-Seq data. The result of a microarray experiment is a gene expression profile. We represent a gene expression profile by jkdb.2012010101.m01, wherejkdb.2012010101.m02is the number of genes and jkdb.2012010101.m03 corresponds to the expression of gene, or feature, jkdb.2012010101.m04. A series of jkdb.2012010101.m05 microarray experiments yields a matrix jkdb.2012010101.m06 where each row represents a single microarray experiment. In this paper, we assume each row of jkdb.2012010101.m07 is a biospecimen from a different patient. The gene expression matrix jkdb.2012010101.m08can be used for supervised or unsupervised learning. However, we focus exclusively on supervised learning. In supervised learning, we have a vector jkdb.2012010101.m09 where each jkdb.2012010101.m10 corresponds to a row of jkdb.2012010101.m11. If jkdb.2012010101.m12 is continuous then this is a regression problem, but if jkdb.2012010101.m13represents an element from a set of categories or labels then this is called a classification problem. A representation of these notational concepts can be seen.


Complete Article List

Search this Journal:
Open Access Articles
Volume 8: 2 Issues (2018)
Volume 7: 2 Issues (2017)
Volume 6: 2 Issues (2016)
Volume 5: 2 Issues (2015)
Volume 4: 2 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing