Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Hybrid Ensemble Learning With Feature Selection for Sentiment Classification in Social Media

Sanur Sharma, Anurag Jain

Source Title: International Journal of Information Retrieval Research (IJIRR) 10(2)

DOI: 10.4018/IJIRR.2020040103

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

This article presents a study on ensemble learning and an empirical evaluation of various ensemble classifiers and ensemble features for sentiment classification of social media data. The data was collected from Twitter in real-time using Twitter API and text pre-processing and ranking-based feature selection is applied to textual data. A framework for a hybrid ensemble learning model is presented where a combination of ensemble features (Information Gain and CHI-Squared) and ensemble classifier that includes Ada Boost with SMO-SVM and Logistic Regression has been implemented. The classification of Twitter data is performed where sentiment analysis is used as a feature. The proposed model has shown improvements as compared to the state-of-the-art methods with an accuracy of 88.2% with a low error rate.

Article Preview

Top

Introduction

In the ongoing surge of social media, user opinions have an incredible reach to the world through the internet. The posts and tweets that user share and the level of interactions that are possible on social media have an immense potential in influencing people. Twitter is one such medium where the user opinions and views build their social profile and present them online. This has made the twitter data-rich and an authoritative source of sharing views which is why twitter data has been used very extensively for study and for making predictions at large. Sentiment analysis is one technique where the text is analysed, and predictions are made based on the user opinions which are derived from the text that has been posted on the medium. Sentiment analysis, in general, classifies the text into positive, negative and neutral and performs evaluation and prediction of events. Various techniques for sentiment classification include machine learning techniques where supervised learning, semi-supervised, unsupervised and ensemble techniques have been applied on the social media dataset. Lexicon based techniques include dictionary-based, corpus-based and Lexicon with Natural Language Processing NLP and hybrid techniques (Medhat et al. 2014; Goyal and Bhatnagar 2016; Hussein, 2018). The various social media data includes Twitter data, social network data, movie and product review data and more.

Social media data is heterogeneous, and data dimensionality is one of the significant factors that make its processing and analysis difficult. The textual nature of the data makes its processing difficult, and to understand the emotions behind the text becomes challenging. The varied number of attributes in social media data causes intractability towards the classification of data. The various challenges that arise in the analysis of such data are domain dependence which includes topic-oriented features, negation handling which alters the meaning of the word, lexicon-based features that characterise the linguistic features of the text, parts of speech tagging, bag of words, term presence and frequency. Another challenge that arises is to identify opinionated words and phrases to understand the contextual meaning of the text. There is also a vast set of lexicons that are present in textual data which makes the extraction process challenging to identify and time-consuming. In consideration to this, feature selection techniques are used to overcome these challenges and perform dimensionality reduction where redundant and irrelevant features of the text are removed to improve the classification of the text. This article considers these challenges that arise in the analysis of textual data and presents feature extraction techniques combined with ensemble learning to make the sentiment classification process efficient (classification accuracy, f-score, etc.) and less complex. The proposed model combines various feature selection techniques and finds the best combination of feature selection methods and further incorporates the best set of ensemble classifiers. The proposed method outperforms various state of the art methods.

This paper contributes in several ways:

•
The proposed approach incorporates different compound feature set using string to word vectorisation, n-gram model and tf-idf (term frequency and inverse document frequency) which performs better than other simple features;
•
The proposed Hybrid Ensemble Learning Method (HELM) incorporates ensemble features in place of using a single set of features which performs repeated feature extraction process to obtain the best set of features;
•
The proposed approach has integrated features like Information Gain (IG) and Chi-Squared (CHI) feature selection algorithms which selects relevant features by evaluating the importance of the features. The performance results of ensemble features are compared to a single set of feature selection methods;
•
The proposed HELM incorporates ensemble classifiers instead of single base classifiers. The performance of proposed HELM classifier (ADA boost + SMO-SVM+Logistics Regression) has been compared with various machine learning classifiers and state of the art ensemble classifiers. It was found that HELM outperforms state of the art classifiers like Naïve Bayes, SVM, LR, SGD, RF and SMO.

Complete Article List

Search this Journal:

Reset

Volume 14: 1 Issue (2024)

Volume 13: 1 Issue (2023)

Volume 12: 4 Issues (2022): 3 Released, 1 Forthcoming

Volume 11: 4 Issues (2021)

Volume 10: 4 Issues (2020)

Volume 9: 4 Issues (2019)

Volume 8: 4 Issues (2018)

Volume 7: 4 Issues (2017)

Volume 6: 4 Issues (2016)

Volume 5: 4 Issues (2015)

Volume 4: 4 Issues (2014)

Volume 3: 4 Issues (2013)

Volume 2: 4 Issues (2012)

Volume 1: 4 Issues (2011)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Hybrid Ensemble Learning With Feature Selection for Sentiment Classification in Social Media

Abstract

Introduction

Complete Article List