Data Pattern Tutor for AprioriAll and PrefixSpan

Mohammed Alshalalfa

doi:10.4018/978-1-60566-010-3.ch083

Hershey, Pennsylvania

New York, New YorkBeijing, China

Special Offers
- Up to 50% off Thousands of Research Books
  From July 1st through October 31st, 2025, we are offering discounts of up to 50% across thousands of titles in Business & Management; Science, Technology, & Medicine; and Education & Social Sciences. Through this campaign, we’re committed to ensuring that our mutual library customers worldwide can continue to access high-quality, peer-reviewed content during these challenging times. If this campaign is successful, we will extend through the end of the year and beyond if there’s a benefit to all parties involved. When hosted on the InfoSci^® Platform, e-books feature no DRM, no additional cost for unlimited-user licensing, full-text PDF & HTML formats, and more. Discount is automatically added at checkout.
  Browse Titles
- IGI Global Scientific Publishing Launches International Brand Ambassador Program
  IGI Global Scientific Publishing has launched a new Ambassador Program, designed to empower research professionals to help spread scholarly resources and foster global research engagement. As a local, mid-sized publisher, this initiative offers IGI Global Scientific Publishing an exciting opportunity to expand its global presence in the academic community and foster meaningful connections among scholars around the world. With currently over 130 ambassadors worldwide, these scholarly experts are dedicated to supporting the publisher’s initiative of disseminating cutting-edge research.
  Learn More
- Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 20 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no hosting or maintenance fees, no additional cost for unlimited-user licensing, full-text PDF & HTML format, and more.
  Learn More
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education & Social Sciences
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all available IGI Global Scientific Publishing open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all available IGI Global Scientific Publishing open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through the IGI Global Scientific Publishing Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global Scientific Publishing to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open access endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global Scientific Publishing to publish your work under open access? Review the IGI Global Scientific Publishing open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Data Pattern Tutor for AprioriAll and PrefixSpan

Mohammed Alshalalfa (University of Calgary, Canada)

Source Title: Encyclopedia of Data Warehousing and Mining, Second Edition

DOI: 10.4018/978-1-60566-010-3.ch083

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Data mining can be described as data processing using sophisticated data search capabilities and statistical algorithms to discover patterns and correlations in large pre-existing databases (Agrawal & Srikant 1995; Zhao & Sourav 2003). From these patterns, new and important information can be obtained that will lead to the discovery of new meanings which can then be translated into enhancements in many current fields. In this paper, we focus on the usability of sequential data mining algorithms. Based on a conducted user study, many of these algorithms are difficult to comprehend. Our goal is to make an interface that acts as a “tutor” to help the users understand better how data mining works. We consider two of the algorithms more commonly used by our students for discovering sequential patterns, namely the AprioriAll and the PrefixSpan algorithms. We hope to generate some educational value, such that the tool could be used as a teaching aid for comprehending data mining algorithms. We concentrated our effort to develop the user interface to be easy to use by naïve end users with minimum computer literacy; the interface is intended to be used by beginners. This will help in having a wider audience and users for the developed tool.

Chapter Preview

Top

Background

Kopanakis and Theodoulidis (2003) highlight the importance of visual data mining and how pictorial representation of data mining outcomes are more meaningful than plain statistics, especially for non-technical users. They suggest many modeling techniques pertaining to association rules, relevance analysis, and classification. With regards to association rules they suggest using grid and bar representations for visualizing not only the raw data but also support, confidence, association rules, and evolution of time.

Eureka! is a visual knowledge discovery tool that specializes in two dimensional (2D) modeling of clustered data for extracting interesting patterns from them (Manco, Pizzuti & Talia 2004). VidaMine is a general purpose tool that provides three visual data mining modeling environments to its user: (a) the meta-query environment allows users through the use of “hooks” and “chains” to specify relationships between the datasets provided as input; (b) the association rule environment allows users to create association rules by dragging and dropping items into both the IF and THEN baskets; and (c) the clustering environment for selecting data clusters and their attributes (Kimani, et al., 2004). After the model derivation phase, the user can perform analysis and visualize the results.

Top

Main Thrust

AprioriAll is a equential data pattern discovery algorithm. It involves a sequence of five phases that work together to uncover sequential data patterns in large datasets. The first three phases, Sorting, L-itemset, and Transformation, take the original database and prepare the information for AprioriAll. The Sorting phase begins by grouping the information, for example a list of customer transactions, into groups of sequences with customer ID as a primary key. The L-itemset phase then scans the sorted database to obtain length one itemsets according to a predetermined minimum support value. These length one itemsets are then mapped to integer value, which will make generating larger candidate patterns much easier. In the Transformation phase, the sorted database is then updated to use the mapped values from the previous phase. If an item in the original sequence does not meet minimum support, it is removed in this phase, as only the parts of the customer sequences that include items found in the length one itemsets can be represented.

After preprocessing the data, AprioriAll efficiently determines sequential patterns in the Sequence phase. Length K sequences are used to generate length K+1 candidate sequences until K+1 sequences can no longer be generated (i.e., K+1, is greater than the largest sequence in the transformed database. Finally, the Maximal Phase prunes down this list of candidates by removing any sequential patterns that are contained within a larger sequential pattern.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Data Pattern Tutor for AprioriAll and PrefixSpan

Abstract

Background

Main Thrust

Complete Chapter List