Analyzing Big Data Using Recent Machine Learning Techniques to Assist Consumers in Online Purchase Decision

Analyzing Big Data Using Recent Machine Learning Techniques to Assist Consumers in Online Purchase Decision

DOI: 10.4018/978-1-6684-8753-2.ch003
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Sentiments can be expressed in a variety of ways like angry, happy, sad, surprised, etc. Recent machine learning (ML) algorithms classify sentiments and assist customers in their purchase decision. Many organizations are predicting the possible correlation between growth of the business and customer satisfaction. On different social media platforms, customers give ‘ratings' to a specific product or service. ML helps in knowing the reasons and assists the businesses to improvise in the weaker sections. Natural language processing integrates data and applies “tokenization” to extract the tokens (words) from the datasets (feedbacks). A set of positive, negative, and neutral words and sentences can be compared to find the relevance. Naïve Bayes classifier, KNN classifier, etc. help knowing the trend and processes the large volume of data in minimal time. This approach helps increasing the predictive power of the model and tests remaining data. Bayesian factor robustness helps analyzing different attribute specifications from the large volume.
Chapter Preview
Top

Introduction

Sentiment analysis has similar names like opinion mining, opinion extraction, sentiment mining, and subjective analysis. Analyzing the sentiments is the ‘key trend’ in many business applications as it directly affects the business growth. It tries to process the given data and retrieve desired sentiments based on mood, emotion, different interpersonal stances, and attitudes. Using a bag of words, annotated lexicons, syntactic patterns, and paragraphs one can interpret the features of sentiments. There are many applications that can be used in various industries like hotel, education, medical, etc. Ultimate objective is to improve the quality of services and the products. Continuous ‘feedbacks’ and ‘improvising accordingly’ helps businesses grow rapidly. Machine learning (ML) tools and techniques helps identifying different patterns from the feedbacks received from ‘n’ customers. One can apply classification methods such as Naïve Bayes to classify the given data used for prediction. Other techniques can be used such as linear regression, deep learning methods, support vector machine (SVM) etc. The model can be designed to help the developers to extract the polarity of words/sentences under the following scale:

  • a.

    Positive

  • b.

    Very positive

  • c.

    Neutral / can’t say

  • d.

    Negative

It is possible to apply the “scaling” like 10- best and 01- worst ratings. Many businesses are applying such techniques to get the feedbacks from the customers. In many countries a CCI – consumer confidence index based on sentiment analysis is prepared to analyze the business growth. Study suggests this index gives brief idea how people think about government policies or a specific business.

Table 1.
Different categories of the sentiments
HappySurprisedNot happy
AngryNeutralDisappointed
ExcitedScaredKind

Sentiment analysis model can consist of following steps:

  • a.

    Extracting feedbacks from different platforms and applying “tokenization”

  • b.

    Vectorization

  • c.

    Training & Testing the selected data

  • d.

    Testing the remaining data

  • e.

    Evaluating the results

  • f.

    Predicting the future data and assisting the customers in purchase decisions

Types of sentiment analysis: -

  • Based on Grading

    • o

      Positive

    • o

      Very positive

    • o

      Negative

    • o

      Neutral

  • Emotion detection

    • o

      Happiness

    • o

      Sadness

    • o

      Anger

    • o

      Disgust

    • o

      Neutral

  • Aspect based sentiment analysis

    • o

      Focuses on specific ‘product’ or ‘service’

  • Intent based sentiment analysis

    • o

      Getting ‘intent’ of the customers & finding the ‘reasons’ behind such intents.

Key Terms in this Chapter

Support Vector Machine (SVM): Support vector machine is a supervised learning algorithm widely used in classification as well as regression problems in machine learning. It transforms the given data and helps identifying the possible boundary between the outputs.

Normalization: It organizes data and has different types 1NF, 2NF, 3NF, BCNF, and 5NF. It minimizes the redundancies and helps avoiding different anomalies like – insertion, update, and deletion.

Data Wrangling: It is mainly used in data analysis where gathering the data, processing and transforming it from one form to another form takes place. Output of the data wrangling is ready to use meaningful data.

Big Data: Data integrated from multiple sources. This could be in structured, unstructured, or semi-structured format. Its characteristics are velocity, large volume, and variety.

Tokenization: Splitting the words or sentences is known as tokenization. It is mainly used in natural language processing and helps converting large volume into smaller parts.

Machine Learning: It is mainly a subset of artificial intelligence (AI) where systems (computers and machines) are trained to behave / work like a human. Systems learn from the available data and past experiences. Its examples are self-automated cars, cyber fraud detection, Facial emotion recognition etc. Machine learning further classified into supervised, unsupervised and reinforcement learning.

Sentiment Analysis: It helps in determining whether the given data is positive, negative or neutral. This is an emerging field in computer science and widely used in e-commerce applications. It also helps in gathering feedback from the customers and contributes in growth of the businesses.

Classification: Categorizing the given data after successful training. In this process class labels are anticipated and model is developed based on predictive approach.

Natural Language Processing (NLP): It can be referred to as a method of Artificial intelligence (AI) communicates with the intelligent systems (computers or other devices) with the help of languages such as English, French, Japanese etc. Once NLP is successfully implemented intelligent systems can perform the tasks in repetitive manner.

Hadoop: It is an open source software platform processes large datasets. It creates a ‘cluster’ of different machine where the data is stored & all the data can be made available for analysis purpose in parallel manner.

Complete Chapter List

Search this Book:
Reset