Privacy Preserving Principal Component Analysis Clustering for Distributed Heterogeneous Gene Expression Datasets

Xin Li

Source Title: Methods, Models, and Computation for Medical Informatics

ISBN13: 9781466626539|ISBN10: 1466626534|EISBN13: 9781466626843

DOI: 10.4018/978-1-4666-2653-9.ch014

MLA

Li, Xin. "Privacy Preserving Principal Component Analysis Clustering for Distributed Heterogeneous Gene Expression Datasets." Methods, Models, and Computation for Medical Informatics, edited by Aryya Gangopadhyay, IGI Global, 2013, pp. 238-271. https://doi.org/10.4018/978-1-4666-2653-9.ch014

APA

Li, X. (2013). Privacy Preserving Principal Component Analysis Clustering for Distributed Heterogeneous Gene Expression Datasets. In A. Gangopadhyay (Ed.), Methods, Models, and Computation for Medical Informatics (pp. 238-271). IGI Global. https://doi.org/10.4018/978-1-4666-2653-9.ch014

Chicago

Li, Xin. "Privacy Preserving Principal Component Analysis Clustering for Distributed Heterogeneous Gene Expression Datasets." In Methods, Models, and Computation for Medical Informatics, edited by Aryya Gangopadhyay, 238-271. Hershey, PA: IGI Global, 2013. https://doi.org/10.4018/978-1-4666-2653-9.ch014

Export Reference

Favorite

View Full Text HTML

View Full Text PDF

Abstract

In this paper, we present approaches to perform principal component analysis (PCA) clustering for distributed heterogeneous genomic datasets with privacy protection. The approaches allow data providers to collaborate together to identify gene profiles from a global viewpoint, and at the same time, protect the sensitive genomic data from possible privacy leaks. We then further develop a framework for privacy preserving PCA-based gene clustering, which includes two types of participants: data providers and a trusted central site (TCS). Two different methodologies are employed: Collective PCA (C-PCA) and Repeating PCA (R-PCA). The C-PCA requires local sites to transmit a sample of original data to the TCS and can be applied to any heterogeneous datasets. The R-PCA approach requires all local sites have the same or similar number of columns, but releases no original data. Experiments on five independent genomic datasets show that both C-PCA and R-PCA approaches maintain very good accuracy compared with the centralized scenario.

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.

Username or email: *

Password: *

Forgot individual login password?

Create individual account

Privacy Preserving Principal Component Analysis Clustering for Distributed Heterogeneous Gene Expression Datasets

MLA

APA

Chicago

Export Reference

Abstract

Request Access