Privacy-Preserving Outsourced Similarity Search

Privacy-Preserving Outsourced Similarity Search

Stepan Kozak (Masaryk University, Brno, Czech Republic), David Novak (Masaryk University, Brno, Czech Republic) and Pavel Zezula (Masaryk University, Brno, Czech Republic)
Copyright: © 2014 |Pages: 24
DOI: 10.4018/jdm.2014070103
OnDemand PDF Download:
No Current Special Offers


The general trend in data management is to outsource data to 3rd party systems that would provide data retrieval as a service. This approach naturally brings privacy concerns about the (potentially sensitive) data. Recently, quite extensive research has been done on privacy-preserving outsourcing of traditional exact-match and keyword search. However, not much attention has been paid to outsourcing of similarity search, which is essential in content-based retrieval in current multimedia, sensor or scientific data. In this paper, the authors propose a scheme of outsourcing similarity search. They define evaluation criteria for these systems with an emphasis on usability, privacy and efficiency in real applications. These criteria can be used as a general guideline for a practical system analysis and we use them to survey and mutually compare existing approaches. As the main result, the authors propose a novel dynamic similarity index EM-Index that works for an arbitrary metric space and ensures data privacy and thus is suitable for search systems outsourced for example in a cloud environment. In comparison with other approaches, the index is fully dynamic (update operations are efficient) and its aim is to transfer as much load from clients to the server as possible.
Article Preview


With the rapid growth of the volume and diversity of digital data produced by all kinds of commercial, scientific and leisure-time applications, the retrieval in large data sets became one of the key IT tasks nowadays. The complex data types, such as multimedia or various sensor data, introduce a natural requirement to be searched not only by their metadata but also by the content of the data itself. This is typically beyond the capabilities of classic exact match or keyword search techniques, which is one of the reasons why the importance of various similarity search technologies increases significantly in current applications. A considerable research effort has been invested in this topic resulting in both theoretical background (Zezula et al., 2006) and large-scale practical results, e.g. (Novak et al., 2009; Esuli, 2011; Batko et al., 2010).

However, the similarity search is a very resource demanding process and the underlying technologies are rather complex. Therefore, there is a strong motivation to develop a general method that would provide the similarity search as a service to make it easily available to the authorized users. With a growing popularity of cloud services, the natural approach is to outsource this task to the cloud environment in a Software as a Service (SaaS) manner. This approach provides many advantages for the owners of the data, such as low initial investments, low storage costs and very good scalability. Also, outsourcing should transfer the computational burden from the clients to the servers, which would enable the clients to be simpler devices (such as cell phones).

On the other hand, the principle of service outsourcing fundamentally assumes that the data is provided to third party repositories that are not fully controlled by the users authorized for the service. The outsourced data may be sensitive (e.g. medicine data), confidential, or otherwise valuable (e.g. collected from a scientific research) and thus the privacy of the data is of high importance. Hence, besides providing effective and efficient searching, outsourced similarity search solutions should also employ mechanisms to ensure privacy requirements not only by standard access permissions, but, more importantly, by securing the content of the indexed data in a potentially hostile third-party environment.

A typical application area for outsourced similarity search is healthcare. The healthcare data is often heterogeneous and complex (X-rays, MRT outputs, various electric signals) which calls for content-based retrieval using similarity search. At the same time, the data managed by healthcare institutions is often extremely voluminous and thus outsourcing the data management can be more convenient than maintaining own complex IT infrastructure. However, privacy of patients-related data is required by law and thus the whole process must be secure and privacy-preserving (Wei et al., 2013).

Ensuring privacy of outsourced search is a widely studied topic in the context of classic databases. There has been a lot of focus on so called symmetric searchable encryption schemes that can form building blocks of secure cloud storages. Such schemes allow encryption of the data in such a way that it is possible to perform selective data retrieval (search) over the encrypted collection while the data privacy is ensured. Recently, Kamara et al. (2010) described the requirements of a practical secure cloud storage considering the general case of relational databases and they proposed a symmetric searchable encryption for this scenario (Kamara et al., 2012). Also, a prototype system providing privacy-preserving outsourced SQL search has been implemented recently (Tu et al., 2013).

These general principles can also be applied in the context of similarity search, however, this area has several specifics that make the outsourcing more difficult. First of all, the similarity search often deals with more than “one level” of the data to be processed: The raw data (for example a set of images) is typically preprocessed to obtain descriptive features (descriptors) that are indexed and searched (for example SIFT features from images (Lowe, 1999)). From the privacy point of view, it is crucial to ensure privacy of both the raw data and the indexed data objects (descriptors), which can be highly correlated with the raw data objects.

Complete Article List

Search this Journal:
Open Access Articles
Volume 33: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 32: 4 Issues (2021)
Volume 31: 4 Issues (2020)
Volume 30: 4 Issues (2019)
Volume 29: 4 Issues (2018)
Volume 28: 4 Issues (2017)
Volume 27: 4 Issues (2016)
Volume 26: 4 Issues (2015)
Volume 25: 4 Issues (2014)
Volume 24: 4 Issues (2013)
Volume 23: 4 Issues (2012)
Volume 22: 4 Issues (2011)
Volume 21: 4 Issues (2010)
Volume 20: 4 Issues (2009)
Volume 19: 4 Issues (2008)
Volume 18: 4 Issues (2007)
Volume 17: 4 Issues (2006)
Volume 16: 4 Issues (2005)
Volume 15: 4 Issues (2004)
Volume 14: 4 Issues (2003)
Volume 13: 4 Issues (2002)
Volume 12: 4 Issues (2001)
Volume 11: 4 Issues (2000)
Volume 10: 4 Issues (1999)
Volume 9: 4 Issues (1998)
Volume 8: 4 Issues (1997)
Volume 7: 4 Issues (1996)
Volume 6: 4 Issues (1995)
Volume 5: 4 Issues (1994)
Volume 4: 4 Issues (1993)
Volume 3: 4 Issues (1992)
Volume 2: 4 Issues (1991)
Volume 1: 2 Issues (1990)
View Complete Journal Contents Listing