Deriving Business Value From Online Data Sources Using Natural Language Processing Techniques

Deriving Business Value From Online Data Sources Using Natural Language Processing Techniques

Stephen Camilleri
Copyright: © 2021 |Pages: 23
DOI: 10.4018/978-1-7998-4240-8.ch002
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The wealth of information produced over the internet empowers businesses to become data-driven organizations, increasing their ability to predict consumer behavior, take more informed strategic decisions, and remain competitive on the market. However, past research did not identify which online data sources companies should choose to achieve such an objective. This chapter aims to analyse how online news articles, social media messages, and user reviews can be exploited by businesses using natural language processing (NLP) techniques to build business intelligence. NLP techniques assist computers to understand and derive a valuable meaning from human (natural) languages. Following a brief introduction to NLP and a description of how these three text streams differ from each other, the chapter discusses six main factors that can assist businesses in choosing one data source from another. The chapter concludes with future directions towards improving business applications involving NLP techniques.
Chapter Preview
Top

Background

Technology has changed the way life is lived; the way interactions take place; the way games are played; the way business is conducted; the way information is handled. What seemed to be impossible up to some decades ago, has become ubiquitous in today’s world and man is constantly automating tasks that are closer to what human beings do.

Key Terms in this Chapter

Syntax: Denotes the grammatical structure.

JSON (JavaScript Object Notation): A lightweight structure for exchanging data.

RSS (really simple syndication): A web feed, which gives users the possibility to receive updates in a structured format.

Dependency Parsing: Describes a sentence as a set of words connected together to form the sentence structure ( Jurafsky & Martin, 2014 ).

Sentiment Analysis: Automatically analyses and classifies user’s emotions.

Tokenization: Refers to the process whereby each sentence is split further into words using the spaces, with each word represented as a token.

Sentence Segmentation: Involves identifying the boundaries of sentences using punctuation and splitting the content into sentences.

Stemmatisation: The morphological process of reducing each word to its root.

Semantics: Concerned with the meaning of the words used within that structure.

Entity Linking: Refers to the automatic association of different expressions representing the same entity.

Part-of-Speech tagging: Refers to the process whereby each word is analyzed to determine whether it is a cardinal or a derivation of a noun, verb, pronoun, preposition, adverb, conjunction, participle or article ( Jurafsky & Martin, 2014 ).

User Mentions: A tag which contains the user’s account name preceded by @ symbol, linking other users to a post.

NLP Parser: An application, which identifies the grammatical and morphological structure of sentences, which includes the rules that govern the arrangement of words; identify part-of-speech of words (nouns, verbs); grouping of words into phrases and the word order typology.

Unstructured Sources: A digital location, whose content does not follow a particular format or mode.

Hashtags: A phrase that has no spaces and is preceded by a # symbol, containing important keywords or topics.

API (Application Programming Interface): Enables data to be transmitted between parties or services using programmed functions.

Lemmatization: Reducing the inflected form to the basic form of the word.

Stop Words: Common words found in a particular language and these are removed to avoid processing data unnecessarily.

Semi-Structured Sources: Are not bound with the structure of a repository ( He et al., 2013 ; Kaushik & Naithani, 2016 ).

Structured Sources: A digital location, which stores data in a structured format.

Complete Chapter List

Search this Book:
Reset