Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Natural Language Processing as Feature Extraction Method for Building Better Predictive Models

Goran Klepac, Marko Velić

Source Title: Modern Computational Models of Semantic Discovery in Natural Language

DOI: 10.4018/978-1-4666-8690-8.ch006

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

This chapter covers natural language processing techniques and their application in predicitve models development. Two case studies are presented. First case describes a project where textual descriptions of various situations in call center of one telecommunication company were processed in order to predict churn. Second case describes sentiment analysis of business news and describes practical and testing issues in text mining projects. Both case studies depict different approaches and are implemented in different tools. Language of the texts processed in these projects is Croatian which belongs to the Slavic group of languages with more complex morphologies and grammar rules than English. Chapter concludes with several points on the future research possible in this domain.

Chapter Preview

Top

Introduction

In big data era, predictive models development should not be based on internal data from structured relation databases as the only disposable data sources for model development. Growing trend of unstructured data gives us opportunity to use potentials from unstructured data sources.

This does not mean that traditional methodology for predictive model development should be neglected; it means that it should be improved with patterns from unstructured data for better performance of the models. For the business problems solving, like churn prediction, fraud detection or other predictive model development in business, introducing elements (patterns) found by natural language processing into predictive business model development introduces gains on model reliability and efficiency.

Traditional approach to predictive model development does not consider textual data as valuable data source for model constructions. Textual data sources like customer comments in call centers or similar data sources are excluded from model development sample, even if it could contain valuable information in domain of churn understanding/ prediction, fraud understanding/ prediction, customer needs for the next best offer modeling etc. Main reason for that is unclear methodology and idea how to use it, beside common attitude that this type of data is useless for predictive business statistical model development based on Bayesian networks, logistic regression, neural networks or similar.

Croatian language is a member of the Slavic group of languages together with Bosnian, Slovenian, Serbian, Macedonian, Russian, Czech, Polish, Ukrainian etc. Altogether Slavic group counts 18 different languages and is spoken by more than 200 million people.

Slavic languages are similar in the roots of the many words and different in grammar rules. Considering natural language processing techniques that try to mitigate problems with grammatical and different morphological word features it is reasonable to assume that it is worth experimenting with models on different languages. More on this will be covered in final sections of this chapter.

Chapter will give solutions on how unstructured data (different kind of text data) with natural language processing could be used as the elements for building better business predictive models. This will be illustrated with two cases.

First case will describe a scenario where a telecom company wants to develop churn predictive model. Case will show how this company used textual data from call center (customer comments written by operators in call center). Collected textual data contains variety of information, questions, and comments from customers entered into textual fields by operators. It contains questions about new services/products, notifications about equipment failure, questions about bills etc. Natural language processing showed some patterns within textual data, which showed strong impact on churn commitment. Recognized textual pattern leads company to conclusion about churn nature and causes. Characteristic of this case is relying on internal data sources – structured and unstructured, where recognized textual pattern could be joined to unique customer, which is important when we want to make predictive business data mining models on customer level.

Second case will show different scenario – developing predictive models for stock market. In this case, public text data will be used for predictive model developing purposes. Stock market predictions are often based on previous price trends (technical analysis) and company’s financial reports (fundamental analysis). There are systems that include collaborative filtering methods (also known as Wisdom of the Crowds) where many users rate stocks, similar to rating movies or books on popular online systems. In addition, there are advancements in sentiment analysis where stock market news or social network messages are being processed to identify possible future trends. This section will show one case where technical analysis, fundamental analysis and collaborative filtering are already in use on one Croatian stock market web portal. In addition, chapter will present development of the sentiment analysis module for mining business news. In the effort to collect annotated dataset that would allow for sentiment analysis, experts (brokers) are asked to annotate more than 500 business news i.e. RSS abstracts of the news collected by the portal’s web parsers.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Natural Language Processing as Feature Extraction Method for Building Better Predictive Models

Abstract

Introduction

Complete Chapter List