Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Big Data Mining and Analytics

Carson Kai-Sang Leung

Source Title: Encyclopedia of Business Analytics and Optimization

DOI: 10.4018/978-1-4666-5202-6.ch030

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Chapter Preview

Top

Introduction

Data mining and analytics aims to analyze valuable data—such as shopper market basket data—and extract implicit, previously unknown, and potentially useful information from the data. Due to advances in technology, high volumes of valuable data—such as streams of banking, financial, and marketing data—are generated in various real-life business applications in modern organizations and society. This leads us into the new era of Big Data (Madden, 2012; Mishne, Dalton, Li, Sharma, & Lin, 2013; Suchanek & Weikum, 2013). Intuitively, Big Data are interesting high-velocity, high-value, and/or high-variety data with volumes beyond the ability of commonly-used software to capture, manage, and process within a tolerable elapsed time. Hence, new forms of processing data are needed to enable enhanced decision making, insight, knowledge discovery, and process optimization. This drives and motivates research and practices in business analytics and optimization, which require techniques like Big Data mining and analytics, business process optimization, applied business statistics, as well as business intelligence solutions and information systems. Having developed systematic or quantitative processes to mine and analyze Big Data allows us to continuously or iteratively explore, investigate, and understand the past business performance so as to gain new insight and drive business planning. Over the past few years, several algorithms have been proposed that use the MapReduce model—which mines the search space with distributed or parallel computing—for different Big Data mining and analytics tasks (Luo, Ding, & Huang, 2012; Shi, 2012; Shim, 2012; Condie, Mineiro, Polyzotis, & Weimer, 2013; Kumar, Niu, & Ré, 2013). One such task is frequent pattern mining, which discovers interesting knowledge in the forms of frequently occurring sets of merchandise items or events. In this chapter, we focus mainly on frequent pattern mining from Big Data with MapReduce.

Top

Background

Since the introduction of the research problem of frequent pattern mining (Agrawal, Imieliński, & Swami, 1993), numerous algorithms have been proposed (Hipp, Güntzer, & Nakhaeizadeh, 2000; Ullman, 2000; Ceglar & Roddick, 2006). Notable ones include the classical Apriori algorithm (Agrawal & Srikant, 1994) and its variants such as the Partition algorithm (Savasere, Omiecinski, & Navathe, 1995). The Apriori algorithm uses a level-wise breadth-first bottom-up approach with a candidate generate-and-test paradigm to mine frequent patterns from transactional databases of precise data. The Partition algorithm divides the databases into several partitions and applies the Apriori algorithm to each partition to obtain patterns that are locally frequent in the partition. As being locally frequent is a necessary condition for a pattern to be globally frequent, these locally frequent patterns are tested to see if they are globally frequent in the databases. To avoid the candidate generate-and-test paradigm, the tree-based FP-growth algorithm (Han, Pei, & Yin, 2000) was proposed. It uses a depth-first pattern-growth (i.e., divide-and-conquer) approach to mine frequent patterns using a tree structure that captures the contents of the databases. By extracting appropriate tree paths, projected databases containing relevant transactions are formed, from which frequent patterns can be discovered.

In many real-life applications, the available data are not precise data but uncertain data (Chen & Wang, 2011; Tong, Chen, Cheng, & Yu, 2012; Jiang & Leung, 2013; Leung, Cuzzocrea, & Jiang, 2013; Leung & Tanbeer, 2013). Examples include sensor data and privacy-preserving data. Over the past few years, several algorithms—such as the tree-based UF-growth algorithm (Leung, Mateo, & Brajczuk, 2008)—have been proposed to mine and analyze these uncertain data.

Key Terms in this Chapter

Frequent Itemset (or Frequent Pattern): Is an itemset or a pattern having its actual support (or expected support) exceeds or equals the user-specified minimum support threshold.

Itemset: Is a set of items.

Business Intelligence: Is a set of theories, methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information.

MapReduce: Is a high-level programming model, which uses the “map” and “reduce” functions, for processing high volumes of data.

Data Mining: Refers to non-trivial extraction of implicit, previously unknown and potentially useful information from data.

Big Data: Are interesting high-velocity, high-value, and/or high-variety data with volumes beyond the ability of commonly-used software to capture, manage, and process within a tolerable elapsed time. These Big Data necessitate new forms of processing to deliver high veracity (& low vulnerability) and to enable enhanced decision making, insight, knowledge discovery, and process optimization.

Business Analytics: Refers to the development of skills and technologies, as well as applications and practices, for continuous iterative exploration, investigation, and understanding of past business performance to gain new insight and drive business planning. It aims to develop quantitative processes for a business to reach optimal decisions and to perform business knowledge discovery.

Frequent Pattern Mining: Searches and analyzes high volumes of valuable data for implicit, previously unknown, and potentially useful patterns consisting of frequently co-occurring events or objects. It helps discover frequently collocated trade fairs and frequently purchased bundles of merchandise items.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference