RSSMSO Rapid Similarity Search on Metric Space Object Stored in Cloud Environment

RSSMSO Rapid Similarity Search on Metric Space Object Stored in Cloud Environment

Raghavendra S. (University Visvesvaraya College of Engineering, Bangalore, India), Nithyashree K. (University Visvesvaraya College of Engineering, Bangalore, India), Geeta C.M. (University Visvesvaraya College of Engineering, Bangalore, India), Rajkumar Buyya (University of Melbourne, Melbourne, Australia), Venugopal K. R. (University Visvesvaraya College of Engineering, Bangalore, India), S. S. Iyengar (Florida International University, Miami, FL, USA) and L. M. Patnaik (National Institute of Advanced Studies, Bangalore, India)
DOI: 10.4018/IJOCI.2016070103
OnDemand PDF Download:
No Current Special Offers


This paper involves a cloud computing environment in which the dataowner outsource the similarity search service to a third party service provider. Privacy of the outsourced data is important because they may be confidential data. The data should be made available to the authorized client groups, but not to be revealed to the service provider in which the data is stored. Given this scenario, the paper presents a technique called RSSMSO which has build phase, query phase, data transformation and search phase. The build phase and the query phase are about uploading the data and querying the data respectively; the data transformation phase transforms the data before submitting it to the service provider for similarity queries on the transformed data; search phase involves searching similar object with respect to query object. The RSSMSO technique provides enhanced query accuracy with low communication cost. Experiments have been carried out on real data sets which exhibits that the proposed work is capable of providing privacy and achieving accuracy at a low cost in comparison with FDH
Article Preview

1. Introduction

There is a rapid growth of the volume and diversity of digital data produced by all kinds of commercial, scientific and leisure-time applications; to search for a desired data in such voluminous data set is a tedious task. The complex data types, such as various sensor data, time series data, gene sequence data introduces a natural requirement for search. It is difficult to search such multimedia data using typical keyword search techniques; hence the similarity search (Zezula et al., 2006) comes into picture. With the growing popularity of cloud services, the natural approach is to outsource this task to the cloud environment. Service outsourcing means that the data is provided to third party repositories that are not controlled by the data owner. The outsourced data may be sensitive and confidential, (e.g. medicine data) or valuable (e.g. collected from a scientific research (Cheng and Church, 2000, Hubble et al., 2009)) and thus the privacy of the data is given more importance.

The concept of similarity search (Zezula et al., 2006) (Raghavendra et al., 2015) is applicable to a wide range of data and infinite number of various similarity functions. The time series pattern which has been collected in hourly or weekly basis can be searched by the scientist for similar patterns to indicate an interesting phenomenon. The similarity search can be used for analysis of DNA patterns for understanding gene or gene groups. Similarity search is most prominently used in the field of health care. Content-based retrieval (Pepsi and Mala, 2013) using similarity search is helpful in healthcare data like X-rays, MRT out-puts, various complex electric signals. New similarity search applications are constantly being developed, ranging from language translation systems to intellectual property protection.

The standard search techniques lie in the core of the similarity search and there are infinite number of (dis)similarity functions that can be used with a wide variety of data types. When searching, the similarity query typically contains a query object and the search should return the data objects that are the most similar to the query according to the specified function.

In our work we mainly focus on the similarity search based on the metric space model. The metric space is an ordered pair IJOCI.2016070103.m01, where IJOCI.2016070103.m02 is a domain of data objects and IJOCI.2016070103.m03 is a total distance function IJOCI.2016070103.m04 satisfying metric postulates of non-negativity, identity, symmetry, and triangle inequality. The set of indexed objects IJOCI.2016070103.m05 is typically searched by the query-by-example paradigm, for instance by the range query IJOCI.2016070103.m06 or by the nearest neighbours query IJOCI.2016070103.m07 covering IJOCI.2016070103.m08 objects from IJOCI.2016070103.m09 with the smallest distances to given IJOCI.2016070103.m10 (Kozak, Novak and Zezula, 2012).

Complete Article List

Search this Journal:
Open Access Articles
Volume 12: 4 Issues (2022): 3 Released, 1 Forthcoming
Volume 11: 4 Issues (2021)
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing