Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Matrix Decomposition-Based Dimensionality Reduction on Graph Data

Hiroto Saigo, Koji Tsuda

Source Title: Graph Data Management: Techniques and Applications

DOI: 10.4018/978-1-61350-053-8.ch011

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Graph is a mathematical framework that allows us to represent and manage many real-world data such as relational data, multimedia data and biomedical data. When each data point is represented as a graph and we are given a number of graphs, a task is to extract a few common patterns that capture the property of each population. A frequent graph mining algorithm such as AGM, gSpan and Gaston can enumerate all the frequent patterns in graph data, however, the number of patterns grows exponentially, therefore it is essential to output only discriminative patterns. There are many existing researches on this topic, but this chapter focus on the use of matrix decomposition techniques, and explains the two general cases where either i) no target label is available, or ii) target label is available for each data point. The reuslting method is a branch and bound pattern mining algorithm with efficient pruning condition, and we evaluate its effectiveness on cheminformatics data.

Chapter Preview

Top

Introduction

Graph is a powerful mathematical framework which enables us to represent and manage various real-world objects in a natural way. The examples include XML, social networks and biological networks. In this chapter, we focus on applications in chemoinformatics where a chemical compound is represented as a labeled graph in which atoms and edges have corresponding labels such as H, C, N and single, double, aromatic bond, respectively. Given a large number of such graphs, a recent approach to extract common features is through the use of frequently appearing subgraphs. Even though it involves subgraph isomorphism problem, which is NP-hard, after the pioneering work AGM (Inokuchi, 2005), several fast frequent graph miners such as Gaston (Nijssen & Kok, 2004), gSpan gSpan (Yan & Han, 2002a) are proposed. A frequent subgraph mining algorithm enumerates all the subgraph patterns that appear more than m times in a graph database. The threshold m is called minimum support. As a next step after mining frequent patterns, we consider learning rules to classify graphs into positive class and negative class according to given labels. In order to achieve the best classification accuracy when classifying graph data, the minimum support threshold (the number of times a subgraph appears in graph data) has to be set to a small value (Wale & Karypis, 2006; Kazius, Nijssen, Kok, Baeck, & Ijzerman, 2006; Helma, Cramer, Kramer, & Raedt, 2004). However, such a setting creates millions of patterns, leading to difficulty when storing patterns and using them in the subsequent learning step. To avoid creating large number of uninformative patterns, we make use of a tree-shaped search space for enumerating frequent patterns (Figure 4). As we go down the tree from the root node, we encounter many child nodes corresponding to supergraphs of the parent node. For an efficiency reason, we desire to traverse only nodes that are necessary for the subsequent learning step.

Figure 4.

Classification accuracy (left) and computational time (right) against maximum pattern size (maxpat) in the CPDB dataset. In the table on the right, “Mining Time” stands for computational time for pattern search, and “Numerical Time” stands for computational time for matrix and vector operations.

In this chapter, we assume that the occurrence of subgraphs mined so far are recorded in a design matrix X. Since the number of subgraphs can be very large, the size of the design matrix can be too large to store in memory. Thereby we employ a so called matrix-free method, which constructs only a part of the whole matrix in an iterative fashion. More precisely, we make advantage of Lanczos method for unsupervised learning, and partial least squares (PLS) regression for supervised learning. The connection between the two appears later in this chapter.

Although we only deal with an application to chemical informatics data in this chapter, the proposed approach is applicable to other data such as protein (Jin, Young, & Wang, 2009), RNA (Tsuda & Kudo, 2006), text (Kudo, Maeda, & Matsumoto, 2005), image (Nowozin, Tsuda, Uno, Kudo, & Bakir, 2007), video (Nowozin, Bakir, & Tsuda, 2007) and so forth.

Top

As a next step after mining frequent patterns, many researchers have worked on integrating frequent pattern mining and rule learning algorithm. They are roughly categorized into filter approach and wrapper approach (Kohavi & John, 1997).

A filter approach first enumerates all the frequent patterns, then perform learning step afterwards (PatClass (Cheng, Yan, Han, & Hsu, 2007)). A pro of this approach is that one can employ recent learning algorithms without further change. Fei and Huan considered the spatial distribution of patterns for classifying graph data (Pattern SFS (Fei & Huan, 2009), LPGB (Fei & Huan, 2010)). A major con of filter approach is that the number of frequent pattern can become too large to perform the subsequent learning step because of a memory and storage issue.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Matrix Decomposition-Based Dimensionality Reduction on Graph Data

Abstract

Introduction

Complete Chapter List

Matrix Decomposition-Based Dimensionality Reduction on Graph Data

Abstract

Introduction

Related Research

Complete Chapter List