Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Privacy Preserving Principal Component Analysis Clustering for Distributed Heterogeneous Gene Expression Datasets

Xin Li

Source Title: International Journal of Computational Models and Algorithms in Medicine (IJCMAM) 2(4)

DOI: 10.4018/jcmam.2011100102

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

In this paper, we present approaches to perform principal component analysis (PCA) clustering for distributed heterogeneous genomic datasets with privacy protection. The approaches allow data providers to collaborate together to identify gene profiles from a global viewpoint, and at the same time, protect the sensitive genomic data from possible privacy leaks. We then further develop a framework for privacy preserving PCA-based gene clustering, which includes two types of participants: data providers and a trusted central site (TCS). Two different methodologies are employed: Collective PCA (C-PCA) and Repeating PCA (R-PCA). The C-PCA requires local sites to transmit a sample of original data to the TCS and can be applied to any heterogeneous datasets. The R-PCA approach requires all local sites have the same or similar number of columns, but releases no original data. Experiments on five independent genomic datasets show that both C-PCA and R-PCA approaches maintain very good accuracy compared with the centralized scenario.

Article Preview

Top

Introduction

In recent years, new bioinformatics technologies, such as gene expression microarray, genome-wide association study, proteomics, and metabolomics, have been widely used to simultaneously identify a huge number of human genomic/genetic biomarkers, generate a tremendously large amount of data, and dramatically increase the knowledge on human genomic/genetic information, thus significantly improving biomedical research. However, these exciting advances in bioinformatics do come with a drawback: the increasingly richer human genomic/genetic data contains sensitive private information, such as genetic markers, diseases, etc., which may further lead to the discovery of the individual’s race, family, or even identity. Therefore, privacy is an important issue when dealing with bioinformatics data. This is further exacerbated when multiple data providers try to collaborate with each other.

Gene Expression and DNA Microarray

A DNA microarray (Wikipedia, 2010), which is the practical realized technology of the Gene Expression (BioChemWeb.org, 2010), is a multiplex technology used in molecular biology. It consists of an arrayed series of thousands of microscopic spots of DNA oligonucleotides, called features, each containing picomoles (10^-12 moles) of a specific DNA sequence, known as probes (or reporters). This can be a short section of a gene or other DNA element that is used to hybridize a cDNA or cRNA sample (called target) under high-stringency conditions. Probe-target hybridization is usually detected and quantified by detection of fluorophore-, silver-, or chemiluminescence-labeled targets to determine relative abundance of nucleic acid sequences in the target. Since an array can contain tens of thousands of probes, a microarray experiment can accomplish many genetic tests in parallel. Therefore arrays have dramatically accelerated many types of investigation. The microarray data processing pipeline (Hackl, Sanchez Cabo, Sturn, Wolkenhauer, & Trajanoski, 2004) includes a variety of statistical steps: pre-processing (including background correction, normalization, and summarization), differential analysis which contains raw p-value computation and false discovery rate (FDR) correction, and gene clustering / profiling analysis. Figure 1(a) shows that microarray experiment process in the lab and Figure 1(b) illustrates its gene clustering result.

Figure 1.

(a) DNA microarray experiment lab processing flow; (b) the gene clustering result (heatmap) of a microarray experiment

Gene Clustering on Collaborative Datasets on Vertical Partitions

Due to the fact that limited technical resources are available of a single research group or institution, researchers are often required to combine multiple gene expression datasets from different research labs/groups/institutions, and to conduct meta-analysis (Griffith et al., 2006; Lu, 2009; Ramasamy et al., 2008) or pooled studies (Szelinger et al., 2011; Wei et al., 2009), which can give them a big picture of gene profiling from a global perspective.

Another situation occurs when multiple labs, groups, and/or institutes perform different treatments on samples, seeking to formulate the gene profile for identical groups or genes from a global perspective. Several meta-analysis methods (Fishel et al., 2007; Yang et al., 2007) have been developed to handle such analysis, specifically if the same set of genes is studied under different treatments done at multiple sites. Successful cross-experiment gene clustering analyses have been completed on a variety of cancers (lung, breast, liver, etc.) to enhance the pathway (Dawany et al., 2010; Shen et al., 2010) or to build global gene profiles (Bianchi et al., 2007; Tseng et al., 2009). This situation results in a vertical partitioning scenario, where multiple datasets have the same rows, but different columns.

Complete Article List

Search this Journal:

Reset

Volume 4: 2 Issues (2014)

Volume 3: 4 Issues (2012)

Volume 2: 4 Issues (2011)

Volume 1: 4 Issues (2010)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference