A Truss-Based Framework for Graph Similarity Computation

The study of graph kernels has been an important area of graph analysis, which is widely used to solve the similarity problems between graphs. Most of the existing graph kernels consider either local or global properties of the graph, and there are few studies on multiscale graph kernels. In this article, the authors propose a framework for graph kernels based on truss decomposition, which allows multiple graph kernels and even any graph comparison algorithms to compare graphs at different scales. The authors utilize this framework to derive variants of five graph kernels and compare them with the corresponding basic graph kernels on graph classification tasks. Experiments on a large number of benchmark datasets demonstrate the effectiveness and efficiency of the proposed framework.

classification has been widely studied in these domains. For example, in protein function prediction, it is a common task to determine whether a protein is an enzyme or a non-enzyme (Borgwardt & Kriegel, 2005). A similar task on a collaboration dataset is to predict which genre an ego-network of an actor/actress belongs to (Yanardag & Vishwanathan, 2015b). Since there are many ways to represent images as graphs, many computers vision tasks, such as classifying images (Tian & Han, 2022), can also be considered as graph classification problems.
To date, the kernel methods are one of the most popular methods for comparing similarities between structured objects, especially in the domain of graph classification. Graph kernels have achieved state-of-the-art results on many graph datasets. In general, the kernel methods utilize a positive semidefinite kernel function to measure the similarity between two objects, which can be expressed as an inner product in reproducing kernel Hilbert space  (Schölkopf & Smola, 2001). Some graph kernels are computed based on local properties, such as the neighborhood features of vertices (Hido & Kashima, 2009), or some small substructures of graphs (i.e., trees, cycles, graphlets) (Horváth et al., 2004;Orsini et al., 2015;Shervashidze et al., 2009Shervashidze et al., , 2011. There are also some graph kernels built from a global perspective and they focus on the global properties of graphs (Kang et al., 2012;Nikolentzos et al., 2017;Wu et al., 2019). Most of the existing graph kernels cannot capture graph similarity at multiple levels of scale. The Multiscale Laplacian Graph Kernel (Kondor & Pan, 2016) is the first graph kernel that can compare graphs at multiple different scales, by building a hierarchy of nested subgraphs. However, it requires Laplacian matrix operations on the graph and thus has a very high runtime.
In the real world, many graphs have different structures at different scales. For example, a molecule has a small structure, such as a specific chemical bond, and an overall structure, such as a chain or a ring. By extracting the structure of the graph at different scales, it is possible to reflect the relationship of local structures relative to the global structure. Therefore, it would be desirable to find an efficient way to reveal the structure of a graph at different scales. Considering that most real-world graphs are usually sparse overall and dense in parts, the dense regions may reflect a higher frequency of interactions between these vertices and greater similarity to each other (Lotito & Montresor, 2020). Hence, the density information of the graph can be used to reveal the graph structure. In the literature, many concepts of cohesive subgraphs have been proposed to explore dense regions. The k -truss is a popular type of cohesive subgraph that has been studied in recent years. It has been applied to community search (Katunka et al., 2017;Yu et al., 2020), complex network visualization (Zhao & Tung, 2012), and other applications. The k -truss of G is the largest subgraph of G in which every edge is contained in at least k -2 triangles within the subgraph (J. D. Cohen, 2008). It is defined based on the triangle, which denotes the existence of stable relations among three nodes. k reflects the density information of the graph. The k -truss decomposition procedure is to find the k -trusses for all possible k on the graph, which leads to hierarchical subgraphs that represent the dense regions of a graph at different scales. In this way, it is possible to reveal graph structure at different scales. To the best of the authors' knowledge, graph kernel combined with k -truss has not yet been studied in the literature.
In this article, a new framework is proposed for graph similarity. The authors take advantage of the k -truss decomposition to obtain hierarchical structures of G at different scales, then add up the similarity between the corresponding subgraphs according to the hierarchy. The combined result provides a more accurate representation of graph similarity. More specifically, the contributions in this article are as follows: • The authors first propose a framework for comparing graphs at multiple different scales utilizing k -truss decomposition of graphs to build a hierarchy of nested subgraphs. The framework can be applied to all methods used for graph comparison.
• The framework generates valid truss variants of several base graph kernels, which enhances the expressivity of base graph kernels for complex substructures. In particular, the framework improves the performance of base graph kernels on graph classification tasks. • The authors evaluate the framework on a set of benchmark datasets. Moreover, the truss variants perform well and achieve significant improvements over the base kernels in most cases.
The rest of this article is organized as follows. In Section 2, the authors review some related work in the literature. In Section 3, the authors review some preliminary concepts and give details about truss degeneracy and k-truss decomposition. In Section 4, the authors introduce the proposed framework for graph similarity. In section 5, the authors implement and evaluate the proposed framework on several benchmark datasets. Section 6 concludes the article.

BACKGRoUND
In this section, the related work is presented in terms of both graph kernel studies and k-truss mining.

Graph Kernels
One of the most popular frameworks to construct kernels is R-convolution (Haussler, 1999) where the key idea is to decompose graphs into substructures and add up the pairwise similarities between them. Specific subgraphs focus on different structural aspects of graphs, such as random walks (Kang et al., 2012;Kashima, 2003), graphlets (Shervashidze et al., 2009), cycles (Horváth et al., 2004), paths (Borgwardt & Kriegel, 2005), and subtrees (B. Li et al., 2012;Morris et al., 2017). Besides the family of R-convolution, assignment kernels have received a lot of attention recently. This framework maximizes the similarity of the two graphs by calculating the matching between their substructures. (Nikolentzos et al., 2017) applies pyramid match kernel to match the node embeddings of graphs. There are also some frameworks that work on top of graph kernels and aim to improve the performance of base kernels. The deep graph kernels framework (Yanardag & Vishwanathan, 2015b) and the structural smoothing framework (Yanardag & Vishwanathan, 2015a) are inspired by natural language processing. R-convolution suffers from diagonal dominance, which means that there are few substructures common to all, causing each graph to be similar only to itself and not to other graphs. These two frameworks were developed to address the diagonal dominance problem. The core framework (Nikolentzos et al., 2018) is another example of improving the performance of graph kernels which takes into account structure at multiple different scales. The -core of is the largest subgraph of in which every vertex has at least neighbors within the subgraph (Jin et al., 2018;Luo, Yu, Cai, et al., 2021;Luo, Yu, Zheng, et al., 2021). It utilizes the well-known -core decomposition of graphs to build a hierarchy of nested subgraphs (Hua et al., 2020;Luo et al., 2019;N. Wang et al., 2017;Zhou et al., 2021). It is also applicable to existing graph kernels, as well as any graph comparison algorithm.

K-Truss Mining
The -truss (J. D. Cohen, 2008) is the largest subgraph of a graph in which every edge is contained in at least ( − 2) triangles within the subgraph. The k-truss decomposition (J. Wang & Cheng, 2012) is to find the k-trusses for all possible k values in the graph. In recent works, (J. D. Cohen, 2008) proposed a base algorithm for truss decomposition, which is a typical in-memory truss decomposition algorithm. (J. Cohen, 2009) used MapReduce framework to perform truss decomposition in parallel, but the proposed method is relatively inefficient when dealing with large graphs. (Chen et al., 2014) first improved the existing distributed k-truss decomposition in the MapReduce framework and designed an algorithm based on graph-parallel abstraction. (J. Wang & Cheng, 2012) modified the algorithm proposed by (J. D. Cohen, 2008) and proposed an efficient in-memory algorithm and two I/O-efficient algorithms for truss decomposition in massive networks. Furthermore, many researchers have studied k -truss on multiple types of graphs, including directed graphs (Liu et al., 2020), probabilistic graphs (Huang et al., 2016), and public-private networks (Ebadian & Huang, 2019). For truss maintenance, (Huang et al., 2014) proposed a simple truss maintenance algorithm on dynamic graphs with single-edge operation. (Akbas & Zhao, 2017) proposed an index structure EquiTruss to accelerate community search which can be efficiently updated dynamically.  is the first work to study the batch processing of truss maintenance in large graphs and their solution is innovative with multi-edge insertions/deletions.

Graph Kernel Theory
In this section, the authors first describe kernel function and kernel matrix, and then an overview of several graph kernels is provided. Let where the operation between f G ( ) and f G ' ( ) is the Euclidean dot product in  . Given a set of graphs G G n 1 , ...,

{
} and a kernel function k , the element K ij of a kernel matrix K is calculated for 1 , £ £ i j n . The kernel matrix must be positive semidefinite (i.e. PSD), as it is a necessary condition for solving some convex optimization problems (including SVM) in kernel-based methods. One way to check if a kernel matrix is PSD is to see if the eigenvalues of this matrix are all non-negative. Furthermore, it is noted that kernel functions are additive, that is, a linear sum of two valid kernels is also valid. This can be formalized as: for 0 1 < < c . A sophisticated kernel can be defined by a linear combination of valid kernels. Good graph kernels are desired to be both effective and efficient. It should be highly expressive of the graph structure and has low computational complexity in both time and space.
The authors have selected several graph kernels for their experiments, which are described below.
• Graphlet Kernel (GR) (Shervashidze et al., 2009). A graphlet is a non-isomorph subgraph with , , , be the set of graphlets of size k , f R G d Î be a vector whose i -th element represents the number of g i divided by the total number of graphlets in G . Formally, the graph kernel is defined as: where the operation between f G and f G ' is the Euclidean dot product.
• Shortest Path Kernel (SP) (Borgwardt & Kriegel, 2005). The first step of the kernel is to transform the original graphs into shortest paths ones using the shortest path algorithm (i.e. Floyd's algorithm). Given G , the shortest paths graph S has the same vertex set as G . If vertex u and v have a walk in G , there is an edge u v , ( ) in S . Then the edge is labeled by the length be the set of all shortest paths in G , with each one denoted as a triplet of the labels of the endpoints and the length of the path. Formally, the shortest path kernel is defined as: where p G be a vector whose i -th element reflects the occurring frequency of p i in G .
• Weisfeiler-Lehman Subtree Kernel (WL) (Shervashidze et al., 2011). The kernel is based on Weisfeiler-Lehman test of graph isomorphism. Let G ,G ' be two graphs, Σ i       be the set of labels that appear in these two graphs at the end of the i -th iteration of the WL algorithm. Especially, Σ 0       is the original label set. In each iteration, the labels of the node and its neighboring nodes merge into a multiset label, which is given a new label. Then the Weisfeiler-Lehman subtree kernel is defined as: where f G ( ) is a vector of h + 1 blocks, denoted as block 0 to block h, and the i -th element in j -th block reflects the occurring frequency of the corresponding label in the j -th iteration.
• Neighborhood Hash Kernel (Hido & Kashima, 2009). For labeled graphs, the kernel replaces the discrete node labels with a unique binary representation, then uses logical operations (XOR and ROT) to compute the neighborhood hash of a vertex. Furthermore, the Count-Sensitive neighborhood hash takes into account the number of neighbors with the same label using additional logical operations. Then the kernel is defined as: where c is the number of common labels in two graphs.
• Pyramid Match Graph Kernel (PM) (Nikolentzos et al., 2017). Firstly, the pyramid match graph kernel embeds the vertices of the input graphs in a vector space and partitions the feature space into regions of increasingly larger size.

Truss Degeneracy and Truss Decomposition
In this part, the authors first give an introduction of k -truss and truss degeneracy, then give details about the truss decomposition algorithm. Table 1 summarizes the symbols used in this article. A k -truss is a connected subgraph of a graph in which every edge is contained in at least k − ( ) 2 triangles (J. D. Cohen, 2008). A triangle in G is a cycle of length 3. The triangle with three vertices u v w , , is denoted by  uvw . The set of triangles of G is denoted as  G . The support of an edge The nested nature above makes truss decomposition a very effective tool for discovering graph hierarchies. An edge belonging to the k + ( ) 1 -truss obviously belongs to the k -truss. Since higherorder trusses are nested within lower-order trusses, an intuitive idea is that the trusses can be calculated in the reverse order of the nested chain above.

7
The traditional batch -truss decomposition algorithm is proposed in (J. D. Cohen, 2008). The algorithm recursively removes all edges with support less than k -2 triangles until there are no more edges to remove, and finally obtains the k -truss. Since the time complexity to search for  uvw of this algorithm is high, (J. Wang & Cheng, 2012)  ) memory space.
In Algorithm 1, initially, the algorithm uses the in-memory triangle counting algorithm (Latapy, , the edges with support less than k -2 are deleted from G , causing the subtraction of 1 from the support of the other two edges of the triangles containing them. The edges whose support values are updated should be reordered. When there are no edges to delete, output G as T k .
Jdm.322087.g01 For some base graph kernels, the nested nature of the truss can be used to avoid some calculations. For example, as for the Weisfeiler-Lehman subtree kernel, given two trusses of a graph T i and T j with i j < , all te subtrees found in T j will also be present in T i . Some calculations can be performed only once, instead of once at each order of truss.

TRUSS FRAMewoRK
In this section, the authors propose a new framework based on k -truss for graph similarity that works on top of graph kernels, and then show how existing graph kernels can be combined with the framework for performance improvement.

Truss-based Graph Kernels
The new framework utilizes truss decomposition to generate the truss variants of the existing base graph kernels. The framework compares graph structures at different scales represented by the graphs' k -trusses. Theoretically, the internal truss implies a stronger connection between the edges. Hence, it is hoped that by decomposing graphs into progressively denser subgraphs, it is easier to get their potential structure and compare graphs more efficiently. Let G V E , = ( ) be two graphs and let k be any kernel for graphs. The truss variant kernel k t of the base kernel k is defined as:  , ,..., * w { } mean the 2-truss, 3-truss,..., w min * -truss of G and G ' respectively. This procedure can be seen as running a specific base kernel function on the corresponding trusses of the two graphs separately and adding up the results of each comparison. Next, the authors prove the positive semidefinite nature of the proposed kernel.

Statement of Validity of the Truss Variants
Let the base kernel k be any positive semidefinite kernel on graphs, then the truss variant kernel k t of the base kernel k is positive semidefinite. Let f be the feature mapping corresponding to the kernel k : Let t i · ( ) be a function that removes all edges with trussness less than i from : If the feature mapping f t i · ( ) ( ) is defined as y · ( ) , then Equation 12 can be expressed as: Hence k is a kernel on T i and T i ' . According to Equation 3, as the sum of a set of positive semidefinite kernels, the proposed kernel k t is also a positive semidefinite kernel. As stated above, the authors have proposed a framework that can improve the performance of existing graph kernels. It can be combined with any algorithm for graph comparison which is not limited to just the graph kernel. Given two graphs G and G ' and a base kernel k , Algorithm 2 gives the procedure for calculating the truss variant kernel k t . JDM.322087.g02

Time Complexity Analysis
A good graph kernel not only requires good graph similarity representation capabilities and strong applicability, but it must also be competitive in terms of time complexity. The truss decomposition itself can be done within polynomial time, so the proposed graph kernel is very promising. In the real world, it always holds that w * G n ( )  for most graphs. Therefore, the proposed framework adds a relatively small-time complexity to the base graph kernel.

eXPeRIMeNTS AND eVALUATIoN
In this section, the authors first introduce the datasets used in the experiments and then give the details of the experiments. Finally, a performance comparison between the basic kernels and truss variants is given.

Datasets
In order to evaluate the performance of the framework, the authors applied the framework to benchmark datasets which include chemoinformatics datasets (MUTAG, PTC-MR, NCI1, and NCI109), bioinformatics datasets (ENZYMES and PROTEINS), social networks datasets (IMDB-BINARY, IMDB-MULTI, REDDIT-BINARY, and REDDIT-MULTI-5K), and synthetic datasets (SYNTHETICnew and Synthie). Note that the chemoinformatics and bioinformatics datasets are labeled, while all other graph datasets are unlabeled. Table 2 summarizes the properties of the datasets used in the experiments.

experimental Setup
To evaluate both the effectiveness and the efficiency of the proposed graph kernels, the authors made use of the GraKeL library (Siglidis et al., 2020)   For the Weisfeiler-Lehman subtree kernel, the number of iterations h was chosen from 1 2 3 4 5 6 , , , , , Similarly, for the neighborhood hash kernel, the neighborhood range R was chosen from 1 2 3 4 5 6 , , , , , The shortest-path kernel lacks hyperparameters, while for the pyramid match graph kernel, the embedding dimension d was chosen from 4 6 8 10 , , , { } and level number L was chosen from 2 4 6 , , To summarize, the parameters of the graph kernel are set as follows: The authors report average accuracies and standard deviations in Table 4. Truss variants with improvements to the basic kernels are shown in bold. Furthermore, to study the runtime, the authors examined the computation time of the kernel matrices and give the comparison of the running time of base kernels vs their truss variants. For each fold of the 10-fold cross-validation, the runtime of the kernel is considered to be the runtime of the kernel matrix that performed best on the validation experiments.

Results
To begin with, base kernels are compared with their truss variants on performing graph classification. Table 4 demonstrates that the proposed framework improves the classification accuracy on almost all datasets. It is noted that the truss variants outperform their base kernels on 43 out of the 60 experiments. On the PTC-MR and REDDIT-MULTI-5K datasets, all the truss variants improve the classification accuracy achieved by the basic kernels. It is clear that on most datasets, the performance of the framework applied to GR and PM is greatly improved. Truss PM improves the accuracy attained by the PM kernel on all datasets. Specifically, truss GR improves the accuracy on 5 datasets by more than 9% , and it improves accuracy on symmetric datasets by more than 12% . Conversely, truss WL improves accuracy by a very small amount on most datasets. One speculation is that WL graph kernel only takes local graph properties into account and aggregates the domain information of the vertices to generate the features of the vertices. It is possible that for most vertices, their local neighborhood in a k-truss is roughly the same as its neighbors in the whole graph.
In terms of running time, it is noted that the additional computational cost of computing the truss variants is negligible. For further analysis, the authors calculated the average truss degeneracy w avg G * ( ) of the graphs for each dataset (shown in Figure 2), and it is visually observed that the running time on a dataset has a close relationship with its w avg G * ( ) value. In particular, on the IMDB-BINARY and IMDB-MULTI datasets, 6 times more time is needed to calculate the truss variants than the base kernels. One issue of interest is that the WL variant has a relatively larger runtime ratio than the other Table 3. { } { } three methods. The intuition is that WL kernel method may compute the kernel matrix significantly faster than other methods. The authors counted the longest and shortest time for the five graph kernels to compute the kernel matrix over the entire graph (i.e., 2-truss) on the ENZYMES dataset, with the values of the hyperparameters of the graph kernels set to the maximum and minimum values of the chosen parameter range. Since SP has no hyperparameters, it has only one computation time. Figure  3 shows the result which is in line with the authors' intuition. The calculation time for SP is also relatively short. In general, the additional computational cost is acceptable because the accuracy rate will increase accordingly. Overall, the truss variants perform better on the social network datasets compared to the bioinformatics and cheminformatics datasets. To investigate the reason, the authors studied the degree distribution of the datasets, as shown in Figure 4. One notable difference between social network and chemoinformatics/bioinformatics datasets is that the former follows the power-law distribution common in nature and social life, while the latter does not. In Figure 4, the social network datasets exhibit significant long-tail characteristics. Since REDDIT-BINARY and REDDIT-MULTI-5K datasets have a degree distribution interval greater than two thousand, the authors use the logarithmic axes to show their degree distribution. For these datasets, it is assumed that the higher-order trusses Table 4. of the graphs capture the most representative areas of the graphs. Conversely, graphs similar to bioinformatics datasets may lead to low truss degeneracies, and many nodes may end up with the same truss value. The basic structural differences between datasets may be a very plausible reason for the greater accuracy improvements in social network datasets compared to chemoinformatics or bioinformatics datasets. For synthetic datasets, SYNTHETICnew and Synthie have very different degree distributions, with the latter being closer to the social network datasets. Truss GR improves accuracy by over 15% on the Synthie dataset, truss SP, truss PM and truss NH all improve accuracy to a certain extent. The framework is proved to be also effective on symmetric datasets and performs better on the Synthie. Finally, the authors choose the IMDB-BINARY dataset and compare the execution of GR and its truss variant on a part of the whole range of k-trusses on the IMDB-BINARY dataset. For k ∈ { } 2 20 ,..., , the GR kernel and its truss variant are computed to perform graph classification. The authors compare the achieved classification accuracies, and the results are in Figure 5. It is noted that the truss variant obtains a higher accuracy compared to GR. In particular, the accuracy of GR drops to a minimum of around k = 15 while the accuracy of truss GR remains high. The accuracy of truss GR gradually increases in the interval of k values, which is in line with our expectations. Consistent conclusions can be drawn on other data sets, which are omitted here.

DISCUSSIoNS
This paper proposes a framework for comparing graphs at multiple different scales, which applies to any graph similarity algorithm. The framework capitalizes on the well-known k-truss decomposition of graphs to build a hierarchy of nested subgraphs. For existing base graph kernels, the framework generates efficient truss variants, enhances the ability of the base graph kernels to represent complex substructures, and improves the performance of the base graph kernel on graph classification tasks. The authors evaluate the performance of the framework on a set of benchmark datasets for graph classification tasks. In most cases, the truss variants perform well and achieve greater improvements over the base kernel, while their time complexity remains very attractive. The framework is a powerful tool for improving the performance of graph kernels and can be applied to any graph comparison algorithm. The truss variants of graph kernels generated by the framework perform well on graph classification tasks and can be used in a wide range of applications such as chemical bioinformatics, cyber security, computer vision, and many others. For example, in biochemical information, the properties and functions of a substance can be predicted based on the similarity of the structure to substances with known functions; in cyber security, unknown code samples are compared with known malware samples and clean code to detect malware; in computer vision, images can be classified, especially in biomedical imaging, to distinguish between different brain states and predict whether a person has a certain disease or not.

CoNCLUSIoN
In this article, the authors propose a framework that enhances the performance of graph kernels, which uses truss decomposition to allow existing algorithms to compare graphs at different scales. Extensive experiments demonstrate the improvement of the truss variants over the basic graph kernels in terms of graph classification accuracy with a relatively small increase in time complexity.

Conflict of Interest
The authors of this publication declare there is no conflict of interest.