Receive a 20% Discount on All Purchases Directly Through IGI Global's Online Bookstore

Carol J. Romanowski (Rochester Institute of Technology, USA)

Copyright: © 2009
|Pages: 7

DOI: 10.4018/978-1-60566-010-3.ch147

Chapter Preview

TopA graph *G* is a structure that contains a set of vertices *V* and their incident edges *E* and comprises many substructures (see Figure 1). A *general subgraph* is a subset of any vertices and edges in the parent graph. An *induced subgraph* contains a subset of the vertices and all edges between those vertices that exist in the parent graph. A *connected subgraph* is a subset of vertices that are connected with edges; no isolated vertices are allowed. *Trees* are acyclic, directed, branched subgraphs that can be ordered (the order of branch nodes is fixed) or unordered (the order of branch nodes is of no concern). Rooted trees have one vertex with no edges coming into it; all other vertices are reachable from the root, while free trees have no root. A *path* is a subgraph that has no branches.

The most common graph mining task is finding frequent subgraph structures within either a large, single graph or a database of smaller graphs (Washio & Motoda, 2003). Most methods for finding frequent subgraphs are based on the Apriori algorithm (Agrawal & Srikant, 1994) and involve four steps:

*1.*Starting with a defined smallest unit, generate increasingly larger subgraphs.

*2.*Check that the generated subgraphs appear in the database or graph.

*3.*Count the number of times each subgraph appears.

*4.*Discard subgraphs that are less frequent than the user-specified minimum, or

*support*, thus avoiding investigation of their supergraphs. Also discard isomorphisms (subgraphs with identical vertices and edges).

Subgraph matching, or isomorphism testing, is thought to have no polynomial time algorithm for general graphs (Skiena, 1998). Graph mining algorithms attempt to reduce the computational load of this problem in many ways, all of which rely on the downward closure property (see step 4). This property states that a subgraph of a larger graph is more frequent than the larger graph; therefore, if a particular subgraph is infrequent, it is removed from the search space because further expansion would not yield a viable candidate. Another way to reduce search space is to restrict candidates to subgraphs with no common edges (Washio & Motoda, 2003).

TopThe most basic difference among graph mining algorithms is whether the input data is a single graph or a database of smaller graphs. Algorithms written for a single graph input can be used on a database of smaller graphs, but the reverse is not true (Goethels et al., 2005). Most general graph and tree mining algorithms are frequent subgraph methods focused on a database of graphs, although the single graph approach is more versatile.

Search this Book:

Reset