Error Types in Natural Language Processing in Inflectional Languages

Error Types in Natural Language Processing in Inflectional Languages

Gregor Donaj (University of Maribor, Slovenia) and Mirjam Sepesy Maučec (University of Maribor, Slovenia)
Copyright: © 2021 |Pages: 14
DOI: 10.4018/978-1-7998-3479-3.ch006
OnDemand PDF Download:
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This article presents the challenges of natural language processing applications when they are used with inflectional languages. Two typical applications are presented: automatic speech recognition and machine translation. An overview of those applications and the properties of inflectional languages is given as well as examples from the highly inflectional Slovene language. Then, an error classification with examples is given, also with an emphasis on inflectional languages, as well as some directions for further research in this area.
Chapter Preview
Top

Background

Natural Language Processing

Large amounts of data are being created online every day, and many data are in the form of text in a natural language. The idea is to process and examine the data and to uncover new knowledge. Computers know how to process structured data, but data in natural language is unstructured. It requires specialised approaches to process it. The research field that copes with this phenomenon is called Natural Language Processing.

Natural Language Processing (NLP) is a sub-field of Artificial Intelligence. It focuses on enabling computers to understand and process natural languages as humans do. Although NLP research has a long history, many problems are still unsolved. Computers are far behind human abilities. We still do not know precisely how humans process language. Despite that, today, computers offer many, quite useful applications that are based on natural language. The machine learning evolution spurred remarkable technology breakthroughs. NLP evolved from a time-consuming process where rules were handwritten by humans, to unsupervised learning, where computers learn from data by themselves.

There are many interesting tasks which are based on NLP:

  • Document summarization: Automatically generating synopses of large bodies of text.

  • Automatic speech recognition: Transforming voice into written text.

  • Speech synthesis: Transforming text into voice.

  • Machine translation: Automatic translation of a text (or speech) from one language to another.

  • Sentiment analysis: Identifying the emotions and subjective opinions within large amounts of text.

  • Semantic analysis: Interpreting human sentences logically.

  • Natural language understanding: Transforming the meaning of a text into a structured semantic form.

  • Natural language generation: Generating text from structured data in a readable format with meaningful phrases and sentences.

  • Question Answering: Generating answers to questions in the form of a sentence. It is based on natural language understanding.

  • Dialogue systems with virtual assistants: Using natural dialogue that mimics a live agent interaction.

Every day, new ideas for applying NLP arise. The goal of almost all NLP tasks is to take raw language input (in written or spoken form) and use linguistic knowledge and algorithms to deliver higher value to the user.

In the continuation of this article we will focus on two NLP tasks: Automatic Speech Recognition and Machine Translation.

Key Terms in this Chapter

Speech Recognition Error: The omission, insertion, or misrecognition of a word in the process of speech recognition.

Machine Translation Error: Error made by machine translation system. To some extent, machine translation errors differ from errors made by human translators.

Language Morphology: A subdiscipline of linguistics, which studies the formation of words in sentences.

Machine Translation: Translation of text or speech from the source language to the target language by a computer, with no human involvement.

Natural Language Processing: A modern research area, dealing with the interaction between humans and computers with language.

Automatic Speech Recognition: The automatic process of a machine to recognize spoken words from an audio signal or recording.

Speech Recognition Vocabulary: The set of word a speech recognition system is able or designed to recognize.

Inflectional Language: A language, where inflectional morphemes determine several morphological properties of a word.

Machine Translation Evaluation: Evaluation of translations, which resulted from Machine Translation. The evaluation can be manual or automatic. Automatic evaluation is done using an evaluation metric. The evaluation metric assigns a quality score to a translation by comparing it to the reference translation.

Complete Chapter List

Search this Book:
Reset