Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

High Performance Computing for Understanding Natural Language

Marija Stanojevic, Jumanah Alshehri, Zoran Obradovic

Source Title: Handbook of Research on Methodologies and Applications of Supercomputing

DOI: 10.4018/978-1-7998-7156-9.ch010

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

The amount of user-generated text available online is growing at an ever-increasing rate due to tremendous progress in enlarging inexpensive storage capacity, processing capabilities, and the popularity of online outlets and social networks. Learning language representation and solving tasks in an end-to-end manner, without a need for human-expert feature extraction and creation, has made models more accurate and much more complicated in the number of parameters, requiring parallelized and distributed resources high-performance computing or cloud. This chapter gives an overview of state-of-the-art natural language processing problems, algorithms, models, and libraries. Parallelized and distributed ways to solve text understanding, representation, and classification tasks are also discussed. Additionally, the importance of high-performance computing for natural language processing applications is illustrated by showing details of a few specific applications that use pre-training or self-supervised learning on large amounts of data in text understanding.

Chapter Preview

Top

Introduction

The exponential data explosion requires developing practical tools for efficient and accurate pattern discovery, classification, representation, trend, and anomaly detection in large-scale high dimensional textual data (Szalay & Gray, 2006). For a decade now, IBM has been using high-performance computing (HPC) to analyze text and create intelligent machines. IBM Watson is a supercomputer that famously leveraged language analysis to win a game of Jeopardy (Hemsoth, 2011).

Advances in natural language processing (NLP) are essential for achieving real artificial intelligence. Language is considered one of the most complex human inventions and essential to human intelligence and social integration. Therefore, success in NLP is a prerequisite for fully functioning, artificially intelligent machines.

The industry is currently the largest contributor to NLP development because of its practical importance in handling large amounts of unstructured online data. Understanding public opinion through user-generated text analysis guides more informed decisions, policies, and products. Due to increased use of online social networks, forums, blogs, product reviews, and news comments, it became easy to collect an extensive amount of text needed for understanding opinions and facts about specific topics. Being able to understand those texts fully can shape politics, marketing, and many other fields.

As natural language models have become more complex in recent years, usage of HPC locally or in the cloud has become inevitable in NLP applications. Most novel NLP models are based on neural networks, which forward and backward propagation can be reduced to a vast matrix (tensor) multiplication. Therefore, Graphics Processing Unit (GPU) or Tensors Processing Unit (TPU) hardware is used for faster training. To enhance those models' speed and usability, they are mostly implemented in a distributed manner and expected to run on a high-performance parallel computing system.

Some popular libraries used in implementing and evaluating the most recent natural language models are: NLTK (Loper & Bird, 2002), Gensim (Rehurek & Sojka, 2010), SpaCy (SpaCy, 2020), TensorFlow (Abadi et al., 2016), PyTorch (Paszke et al., 2019), Keras (Chollet, 2017), scikit-learn (Pedregosa, 2011) and all of them support parallel and distributed processing, while most support GPU, and some even run on TPU hardware. Many of those frameworks are easy to learn and have complex neural networks and machine learning modules readily available for use. For those practitioners wanting to create and parallelize their algorithms in python, there is an open-source library, Dask (Dask Development Team, 2019), that natively scales python code. Also, Google has recently developed JAX (Google, 2020), which can transform any python code to allow backpropagation through it. This framework allows an additional training speed up by an innovative combination of operations and simple transformation pmap, making the algorithm parallelizable and easy to execute on HPC.

Figure 1.

Common NLP Applications

Using those and similar frameworks, people have created data mining and machine learning-based algorithms for different NLP applications. Some of these applications are listed below and summarized in Figure 1.

Some of common NLP applications are:

1.
Modeling public opinion from social media and news on different topics (e.g., politics, racism, COVID-19, vaccination);
2.
Understand a person’s state and behavior (e.g., depression, suicidal thoughts, interest in products, dementia);
3.
Sentiment analysis, which goal is to predict the emotion of a given text;
4.
Text classification, categorizing text into predefined categories as variables to solve machine learning problems;
5.
Understanding and summarizing large amounts of scientific or legal documents;
6.
Translation between multiple languages (simultaneously);
7.
Chatbots and dialog systems, which can make full-textual conversations with a human agent or another machine;
8.
Answering questions automatically, where machines learn how to answer requests coming from humans; and
9.
Transcription systems, which aim to teach machines to transcribe voice to text or text to voice.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

High Performance Computing for Understanding Natural Language

Abstract

Introduction

Complete Chapter List