Article Preview
Top1. Introduction
There is a rapid growth of the volume and diversity of digital data produced by all kinds of commercial, scientific and leisure-time applications; to search for a desired data in such voluminous data set is a tedious task. The complex data types, such as various sensor data, time series data, gene sequence data introduces a natural requirement for search. It is difficult to search such multimedia data using typical keyword search techniques; hence the similarity search (Zezula et al., 2006) comes into picture. With the growing popularity of cloud services, the natural approach is to outsource this task to the cloud environment. Service outsourcing means that the data is provided to third party repositories that are not controlled by the data owner. The outsourced data may be sensitive and confidential, (e.g. medicine data) or valuable (e.g. collected from a scientific research (Cheng and Church, 2000, Hubble et al., 2009)) and thus the privacy of the data is given more importance.
The concept of similarity search (Zezula et al., 2006) (Raghavendra et al., 2015) is applicable to a wide range of data and infinite number of various similarity functions. The time series pattern which has been collected in hourly or weekly basis can be searched by the scientist for similar patterns to indicate an interesting phenomenon. The similarity search can be used for analysis of DNA patterns for understanding gene or gene groups. Similarity search is most prominently used in the field of health care. Content-based retrieval (Pepsi and Mala, 2013) using similarity search is helpful in healthcare data like X-rays, MRT out-puts, various complex electric signals. New similarity search applications are constantly being developed, ranging from language translation systems to intellectual property protection.
The standard search techniques lie in the core of the similarity search and there are infinite number of (dis)similarity functions that can be used with a wide variety of data types. When searching, the similarity query typically contains a query object and the search should return the data objects that are the most similar to the query according to the specified function.
In our work we mainly focus on the similarity search based on the metric space model. The metric space is an ordered pair , where is a domain of data objects and is a total distance function satisfying metric postulates of non-negativity, identity, symmetry, and triangle inequality. The set of indexed objects is typically searched by the query-by-example paradigm, for instance by the range query or by the nearest neighbours query covering objects from with the smallest distances to given (Kozak, Novak and Zezula, 2012).