G-Hash: Towards Fast Kernel-Based Similarity Search in Large Graph Databases

G-Hash: Towards Fast Kernel-Based Similarity Search in Large Graph Databases

Xiaohong Wang (University of Kansas, USA), Jun Huan (University of Kansas, USA), Aaron Smalter (University of Kansas, USA) and Gerald H. Lushington (University of Kansas, USA)
Copyright: © 2012 |Pages: 38
DOI: 10.4018/978-1-61350-053-8.ch008
OnDemand PDF Download:
No Current Special Offers


Our objective in this chapter is to enable fast similarity search in large graph databases with graph kernel functions. In particular, we propose (i) a novel kernel-based similarity measurement and (ii) an efficient indexing structure for graph data management. In our method, we use a hash table to support efficient storage and fast search of the extracted local features from graph data. Using the hash table, we have developed a graph kernel function to capture the intrinsic similarity of graphs and for fast similarity query processing. We have demonstrated the utility of the proposed methods using large chemical structure graph databases.
Chapter Preview


Structured data including sets, sequences, trees and graphs, pose significant challenges to fundamental aspects of data management such as efficient storage, indexing, component search (e.g. subgraph/ supergraph search) and similarity search. Among these structured data, the use of graph representations has gained popularity in pattern recognition and machine learning. The main advantage of graphs is that they offer a powerful way to represent structured data. Querying and mining of the graphs can contribute to our understanding in numerous ways: understanding of new connectivity patterns, evolutionary changes and discovery of topological features. Queries in graph databases can be broadly classified into two categories: (i) subgraph query and (ii) similarity query. The aim of subgraph query is to identify a set of graphs that contain a query graph. The aim of similarity query is to identify similar graphs in a graph database to a query, according to a distance metric. There are two types of similarity query, i.e. k-NNs query and range query. In k-NNs query, the k most similar graphs are reported. In range query, all graphs within a predefined distance to the query graph are reported. In this chapter, we address the problem of k-NNs similarity search in large database of graphs.

Complete Chapter List

Search this Book: