Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Hybrid Clustering Technique to Cluster Big Data in the Hadoop Ecosystem: Big Data Application

E. Padmalatha, S. Sailekya

Source Title: Handbook of Research on Technologies and Systems for E-Collaboration During Global Crises

DOI: 10.4018/978-1-7998-9640-1.ch015

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Big data analytics as well as data mining play vital roles in extracting the hidden statistics. Customary advances for investigation and extraction of hidden information from data may not exert efficiently for big data because of its complex, elevated volume nature. Data clustering is a data mining technique that exacts the useful data from the data by grouping data into clusters. In big data as the data is complex and of very large volume, individual clustering techniques may not consider all the samples, which may lead to inaccurate results. To overcome this inaccuracy, the proposed method is the combination of dynamic k-means and hierarchical clustering algorithms. This proposed method can be called a hybrid method. Being a hybrid method will overcome a few drawbacks like static k value. In this chapter, the proposed method is compared with existing algorithms by using some clustering metrics.

Chapter Preview

Top

Introduction

Big data analytics has become trend in the market and is used to perform analytics on this big data. It is used to extract hidden patterns, unknown correlations and helps organizations in decision making. Big data is the problem and Hadoop is the solution for handling big data available as an open-source framework. Clustering is one of the techniques used to extract insights from big data (Raghupathi & Raghupathi 2014). Traditional clustering techniques may not work for efficient clustering in big data. Consequently, there remains need towards plan an competent & extremely scalable clustering algorithm. This has motivated towards propose a novel algorithm called hybrid clustering algorithm for big data in Hadoop ecosystem (Katal et al., 2013). In Big data analysis characteristics individual clustering techniques like kmeans mean and hierarchical may not consider all the samples which leads to inaccurate results. K-means and hierarchical gathering techniques meet halfway because of the limitations of individual clustering algorithms. Few drawbacks of traditional clustering algorithms are k-means clustering in this algorithm it remains hard towards predict the k value, wrong prediction of k value many data points may not fit into any of the clusters; several merge split decisions and iteration in hierarchical clustering, etc. (Aggarwal & Zhai 2012).

Grouping is important device for information mining & information revelation. The aim of bunching is to discover considerable gatherings of substances moreover to divide groups framed for a dataset. Customary K-implies grouping functions admirably when functional to little datasets (Pandove & Goel 2015). Enormous datasets should be grouped through the end objective that each and all other substance or information point in the bunch is like several elements in a similar group. Grouping issues can be applied to a few bunching disciplines. The capacity towards consequently bunch comparative things empowers one to find covered up likenesses & key ideas while joining a lot of information into a couple of gatherings. This empowers clients towards fathom a lot of information. Groups can be delegated homogeneous & heterogeneous bunches. In homogeneous groups, all hubs contain comparable possessions (Firouzi et al., 2010). Heterogeneous bunches remain exploited in private server farms in which hubs have a variety of attributes moreover in which it could be hard to be familiar with hubs Embrocates (Demchenko et al., 2013).

Clustering techniques require the use of more exact meanings of perception and group likenesses. When gathering depends on ascribes, it is normal to utilize recognizable ideas of distance. An issue with this strategy is related with the estimation of distances between groups including at least two perceptions. (Fernández et al., 2014) In contrast to existing regular measurable techniques, most grouping calculations doesn’t depend on factual circulations of information and in this manner can be useful to apply when minimal earlier information exists on a specific issue (Ghazal et al., 2013) portrayed how the quantity of emphases can be diminished by parceling a dataset into covering subsets and by just emphasizing information objects inside covering zones (Battré et al., 2010)

The remainder of this works remains organized as follows. The ‘History' section contains relevant surveys on the subject of Big data clustering. We provide a background on Apache Spark in ‘Research Paper' The section under 'Study Design' describes the survey's research methods. The section ‘Survey Methods' goes through the various Spark clustering algorithms. We provide our analysis on clustering large data with Spark and upcoming projects in ‘Discussion and Future Directions.' Lastly, in ‘Findings,' bring the paper to be close.

Limitations of Existing Methods

The existing methods like big-data related clustering models with honeybee, genetic and PSO techniques cannot provide accurate bigdata storage. The limitations like static k, dynamic k and hadoop storage issue are cannot solve exactly. The silhouette score, Calinski-Harabasz Index, & Davies - Bouldin Index cannot be improved with this method (Jiang et al., 2010).

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Hybrid Clustering Technique to Cluster Big Data in the Hadoop Ecosystem: Big Data Application

Abstract

Introduction

Limitations of Existing Methods

Complete Chapter List