Summarization in Pattern Mining

Mohammad Al Hasan

doi:10.4018/978-1-60566-010-3.ch287

Access Full-Text Recommend to Your Library

Buy Instant Access to This Chapter

Instant access upon order completion

Add to Cart

Share

Recommend to Librarian Recommend to Colleague Fair Use Policy

Free Content

Sample PDF

More Information

Rights & Permissions

Favorite Cite Chapter

MLA

Al Hasan, Mohammad. "Summarization in Pattern Mining." Encyclopedia of Data Warehousing and Mining, Second Edition, edited by John Wang, IGI Global Scientific Publishing, 2009, pp. 1877-1883. https://doi.org/10.4018/978-1-60566-010-3.ch287

APA

Al Hasan, M. (2009). Summarization in Pattern Mining. In J. Wang (Ed.), Encyclopedia of Data Warehousing and Mining, Second Edition (pp. 1877-1883). IGI Global Scientific Publishing. https://doi.org/10.4018/978-1-60566-010-3.ch287

Chicago

Al Hasan, Mohammad. "Summarization in Pattern Mining." In Encyclopedia of Data Warehousing and Mining, Second Edition, edited by John Wang, 1877-1883. Hershey, PA: IGI Global Scientific Publishing, 2009. https://doi.org/10.4018/978-1-60566-010-3.ch287

Export Reference

For Librarians

Summarization in Pattern Mining

Mohammad Al Hasan (Rensselaer Polytechnic Institute, USA)

Source Title: Encyclopedia of Data Warehousing and Mining, Second Edition

DOI: 10.4018/978-1-60566-010-3.ch287

Abstract

The research on mining interesting patterns from transactions or scientific datasets has matured over the last two decades. At present, numerous algorithms exist to mine patterns of variable complexities, such as set, sequence, tree, graph, etc. Collectively, they are referred as Frequent Pattern Mining (FPM) algorithms. FPM is useful in most of the prominent knowledge discovery tasks, like classification, clustering, outlier detection, etc. They can be further used, in database tasks, like indexing and hashing while storing a large collection of patterns. But, the usage of FPM in real-life knowledge discovery systems is considerably low in comparison to their potential. The prime reason is the lack of interpretability caused from the enormity of the output-set size. For instance, a moderate size graph dataset with merely thousand graphs can produce millions of frequent graph patterns with a reasonable support value. This is expected due to the combinatorial search space of pattern mining. However, classification, clustering, and other similar Knowledge discovery tasks should not use that many patterns as their knowledge nuggets (features), as it would increase the time and memory complexity of the system. Moreover, it can cause a deterioration of the task quality because of the popular “curse of dimensionality” effect. So, in recent years, researchers felt the need to summarize the output set of FPM algorithms, so that the summary-set is small, non-redundant and discriminative. There are different summarization techniques: lossless, profile-based, cluster-based, statistical, etc. In this article, we like to overview the main concept of these summarization techniques, with a comparative discussion of their strength, weakness, applicability and computation cost.

Chapter Preview

Top

Background

FPM had been the core research topic in the field of data mining for the last decade. Since, its inception with the seminal paper of mining association rules by Argrawal et. al (Agrawal & Srikant, 1994), it had matured enormously. Currently, we have very efficient algorithms for mining patterns with higher complexity, like sequence (Zaki, 2001), tree (Zaki, 2005) and graph (Yan & Han, 2002; Hasan, Chaoji, Salem & Zaki, 2005). The objective of FPM is as follows: Given a database, , of a collection of events (an event can be as simple as a set or as complex as a graph) and a user defined support threshold, ; return all patterns (patterns can be set, tree, graph, etc. depending on ) that are frequent with respect to . Sometimes, additional constraints can be imposed besides the minimum support criteria. For details on FPM, see data mining textbooks (Han & Kamber, 2001).

Complete Chapter List

Search this Book:

Reset