Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Learning Algorithms of Sentiment Analysis: A Comparative Approach to Improve Data Goodness

Suania Acampa, Ciro Clemente De Falco, Domenico Trezza

Source Title: Handbook of Research on Advanced Research Methodologies for a Digital Society

DOI: 10.4018/978-1-7998-8473-6.ch012

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

The uncritical application of automatic analysis techniques can be insidious. For this reason, the scientific community is very interested in the supervised approach. Can this be enough? This chapter aims to these issues by comparing three machine learning approaches to measuring the sentiment. The case study is the analysis of the sentiment expressed by the Italians on Twitter during the first post-lockdown day. To start the supervised model, it has been necessary to build a stratified sample of tweets by daily and classifying them manually. The model to be test provides for further analysis at the end of the process useful for comparing the three models: index will be built on the tweets processed with the aim of detecting the goodness of the results produced. The comparison of the three algorithms helps the authors to understand not only which is the best approach for the Italian language but tries to understand which strategy is to verify the quality of the data obtained.

Chapter Preview

Top

Introduction

Big Corpora and Digital Methods: A Critical Approach to Improve Data Goodness

The ubiquity of digital technologies and the popularity of opinion-rich platforms such as social media and review sites generates a large and rapid amount of user-generated data encoded in natural language daily. Reviews, tweets, likes, links, shares, texts, posts, tags etc.; these are only part of the billions of digital traces that we leave on the web every day, through which it is possible to accurately trace the tastes, opinions, and attitudes of everyone. Big corpora represent a profitable empirical basis for all those who investigate social phenomena on the net. The production and increasing availability of data offers new possible forms of knowledge of social complexity that social researchers cannot ignore. The data revolution is considered as “the sum of the disruptive social and technological changes that are transforming the routine of construction, management and analysis of data consolidated within the various scientific disciplines” (Amaturo, Aragona, 2017, p.1). The new digital technologies and big data allow social research to move from the construction of empirical bases through interrogation to the construction of empirical bases through survey. Big data allows us to measure complex phenomena in detail in real time, thanks to the evolution of IT tools and techniques such as artificial intelligence, machine learning, and natural language processing. This promotes interdisciplinarity between different scientific areas and provides social researchers solid empirical bases for experimenting and integrating new and traditional approaches to social research. These technologies push the social sciences into a scenario in which “web-mediated research [...] is already transforming the way researchers practice traditional research methods transposed to the web” (Amaturo and Punziano, 2016, 35, 36).

To be able to describe and analyse this wealth of information, social scientists have also begun to use computational analytical methods to assemble, filter and interpret user generated data encoded in natural language. Text mining is part of this context, a branch of data mining that allows you to analyse vast textual corpora in different languages by extracting high quality information with very limited manual intervention. Natural language processing (NLP) is the area of machine learning dedicated to the meaning of the written word.

A very profitable branch of natural language processing is sentiment analysis: it consists in the extraction and analysis of the opinions that users express on the web towards products, services, topics or characters. With language processing and text analysis, sentiment analysis identifies subjective information in sources. The main objective is to determine the general polarity of a text (whether it is a review or a comment) and classify it into three categories: positive, negative or neutral. Sentiment analysis techniques are divided according to the type of approach used: lexicon based or machine learning approach. The machine learning approach treats sentiment classification as a question of general text classification. This approach to classification is divided between unsupervised and supervised learning models. In supervised models it is necessary to arrange a training set labelled with the indication of the polarity of the feeling (negative, positive, neutral) that the algorithm will use to predict the polarity of other textual content contained in the test set. The machine learning approach has the advantage of not depending on the availability of dictionaries, but the accuracy of the classification methods depends a lot on the correct labelling of the texts used for training and on a careful selection of the features by the algorithm. The results of the three supervised algorithms were adopted and compared through an analysis model that involved the construction of the labelled training on which the three models were tested to evaluate the accuracy of each. The next step involved recoding the processed tweets based on their agreement/discrepancy with the output returned by each of the three algorithms. The tweet analysis allows us to define the components (text, sentiment, and other features) that suggest a plausible relationship with the functioning of the algorithm. These algorithms that work through learning open interesting developments by defining data accuracy parameters in relation to validated benchmarks. Our work examines sentiment in a sample of one-day tweets in Italian (May 4, 2020) related to phase 2 of the post-lockdown. The tweets were processed with the three most widely used algorithms in the literature for this type of analysis (Naives Bayes, Decision Tree and Logistic Regression). The results of the three supervised algorithms were adopted and compared based on the accuracy of each and the predictive ability. To check if there were latent differences in the corpus, it was decided to use a lexical correspondence analysis (ACL) which allowed us to define the components (text, sentiment, and other characteristics) that give us information about the functioning of the algorithm. Although the techniques are advancing rapidly and their performances are improving year by year, the analysis shows that the functioning of the chosen algorithms still present various limits for the Italian language.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Learning Algorithms of Sentiment Analysis: A Comparative Approach to Improve Data Goodness

Abstract

Introduction

Big Corpora and Digital Methods: A Critical Approach to Improve Data Goodness

Complete Chapter List