Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Big Data Analytics Using Local Exceptionality Detection

Martin Atzmueller, Dennis Mollenhauer, Andreas Schmidt

Source Title: Enterprise Big Data Engineering, Analytics, and Management

DOI: 10.4018/978-1-5225-0293-7.ch007

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Large-scale data processing is one of the key challenges concerning many application domains, especially considering ubiquitous and big data. In these contexts, subgroup discovery provides both a flexible data analysis and knowledge discovery method. Subgroup discovery and pattern mining are important descriptive data mining tasks. They can be applied, for example, in order to obtain an overview on the relations in the data, for automatic hypotheses generation, and for a number of knowledge discovery applications. This chapter presents the novel SD-MapR algorithmic framework for large-scale local exceptionality detection implemented using subgroup discovery on the Map/Reduce framework. We describe the basic algorithm in detail and provide an experimental evaluation using several real-world datasets. We tackle two algorithmic variants focusing on simple and more complex target concepts, i.e., presenting an implementation of exceptional model mining on large attributed graphs. The results of our evaluation show the scalability of the presented approach for large data sets.

Chapter Preview

Top

Introduction

With the exponential growth of the available data, e.g., due to ubiquitous applications and services, large-scale data mining provides many challenges. Efficient and scalable methods need to be developed that on the one hand provide the handling of such large data, on the other hand support an efficient and scalable analysis approach. In this chapter, we focus on subgroup discovery for local exceptionality detection on large datasets. During data exploration, the data analyst, for example, might be interested in partitions of the data that show some specific exceptional characteristics, and respective descriptions of these partitions. An exploratory analysis approach for identifying such a subset of the data with a concise description is given by subgroup discovery (e.g., Klösgen 1996; Wrobel 1997; Atzmueller 2015) – here, also specifically the variant of exceptional model mining (Leman 2008; Duivestein 2016) as an approach for modeling complex exceptionality criteria. Intuitively, subgroup discovery aims at identifying such an exceptional subgroup of the whole dataset, e.g., concerning notable different distribution of some target concept, where the subgroup typically also should be as large as possible. Exceptional model mining especially focuses on complex target properties; it considers specific model classes, such as a correlation model between two variables, linear regression, or complex graph properties.

Overall, subgroup discovery is a broadly applicable data mining technique which can be applied for descriptive data mining as well as predictive data mining. We can obtain an overview on the relations in the data, for example, for automatic hypotheses generation, for attribute construction, or for obtaining a rule-based classification model. The basic idea is to identify subgroups covering instances of the dataset, which show some interesting, i.e., unexpected, deviating or exceptional behavior, concerning a given target concept. This notion can be flexibly formalized using a quality function. We can estimate, for example, the deviation of the mean of a numeric target concept in the subgroup compared to the whole dataset; more complex functions utilizing graph-structured data consider, e.g., the density of a certain subgraph compared to the expected density of a null model given by a random edge assignment approach.

In this chapter, we present the novel SD-MapR algorithmic framework for large-scale subgroup discovery: Based on data projection techniques of the FP-Growth (Han et al. 2000) and the Parallel FP-Growth (PFP) algorithm (Li et al. 2008) for large-scale frequent pattern mining, SD-MapR employs the Map/Reduce framework (Dean & Ghemawat 2008) for large-scale data processing. The basic idea of SD-MapR is the construction of projected databases such that the subgroup discovery task can be independently deployed on several computation clusters in a divide-and-conquer manner, inspired by the PFP algorithm. For local exceptionality detection, we propose the efficient subgroup discovery algorithms SD-Map* (Atzmueller & Lemmerich 2009), GP-Growth (Lemmerich et al. 2012), and COMODO (Atzmueller et al. 2015a) which can be applied for instantiating SD-MapR. Specifically, we present specific adaptations of the SD-Map* and the COMODO (Atzmueller et al. 2015a) algorithms for implementing SD-MapR.

The remainder of this chapter is structured as follows: In the next section, we introduce some preliminaries on local exceptionality detection using subgroup discovery and exceptional model mining, the respective state-of-the-art algorithms, and the Map/Reduce framework. After that, we describe the novel SD-MapR algorithmic framewrok in detail. Next, we provide a comprehensive evaluation of the presented algorithms using ubiquitous data, and show the scalability and performance for large-scale datasets. Finally, we conclude with a summary and point out interesting options for future work.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Big Data Analytics Using Local Exceptionality Detection

Abstract

Introduction

Complete Chapter List