Search the World's Largest Database of Information Science & Technology Terms & Definitions
InfInfoScipedia LogoScipedia
A Free Service of IGI Global Publishing House
Below please find a list of definitions for the term that
you selected from multiple scholarly research resources.

What is Automatic Speech Recognition

Encyclopedia of Artificial Intelligence
Machine recognition and conversion of spoken words into text.
Published in Chapter:
Analytics for Noisy Unstructured Text Data I
Shourya Roy (IBM Research, India Research Lab, India) and L. Venkata Subramaniam (IBM Research, India Research Lab, India)
Copyright: © 2009 |Pages: 6
DOI: 10.4018/978-1-59904-849-9.ch015
Abstract
Accdrnig to rscheearch at Cmabrigde Uinervtisy, it deosn’t mttaer in what oredr the ltteers in a wrod are, the olny iprmoetnt tihng is that the frist and lsat ltteer be at the rghit pclae. Tihs is bcuseae the human mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.1 Unfortunately computing systems are not yet as smart as the human mind. Over the last couple of years a significant number of researchers have been focussing on noisy text analytics. Noisy text data is found in informal settings (online chat, SMS, e-mails, message boards, among others) and in text produced through automated speech recognition or optical character recognition systems. Noise can possibly degrade the performance of other information processing algorithms such as classification, clustering, summarization and information extraction. We will identify some of the key research areas for noisy text and give a brief overview of the state of the art. These areas will be, (i) classification of noisy text, (ii) correcting noisy text, (iii) information extraction from noisy text. We will cover the first one in this chapter and the later two in the next chapter. We define noise in text as any kind of difference in the surface form of an electronic text from the intended, correct or original text. We see such noisy text everyday in various forms. Each of them has unique characteristics and hence requires special handling. We introduce some such forms of noisy textual data in this section. Online Noisy Documents: E-mails, chat logs, scrapbook entries, newsgroup postings, threads in discussion fora, blogs, etc., fall under this category. People are typically less careful about the sanity of written content in such informal modes of communication. These are characterized by frequent misspellings, commonly and not so commonly used abbreviations, incomplete sentences, missing punctuations and so on. Almost always noisy documents are human interpretable, if not by everyone, at least by intended readers. SMS: Short Message Services are becoming more and more common. Language usage over SMS text significantly differs from the standard form of the language. An urge towards shorter message length facilitating faster typing and the need for semantic clarity, shape the structure of this non-standard form known as the texting language (Choudhury et. al., 2007). Text Generated by ASR Devices: ASR is the process of converting a speech signal to a sequence of words. An ASR system takes speech signal such as monologs, discussions between people, telephonic conversations, etc. as input and produces a string a words, typically not demarcated by punctuations as transcripts. An ASR system consists of an acoustic model, a language model and a decoding algorithm. The acoustic model is trained on speech data and their corresponding manual transcripts. The language model is trained on a large monolingual corpus. ASR convert audio into text by searching the acoustic model and language model space using the decoding algorithm. Most conversations at contact centers today between agents and customers are recorded. To do any processing of this data to obtain customer intelligence it is necessary to convert the audio into text. Text Generated by OCR Devices: Optical character recognition, or ‘OCR’, is a technology that allows digital images of typed or handwritten text to be transferred into an editable text document. It takes the picture of text and translates the text into Unicode or ASCII. . For handwritten optical character recognition, the rate of recognition is 80% to 90% with clean handwriting. Call Logs in Contact Centers: Today’s contact centers (also known as call centers, BPOs, KPOs) produce huge amounts of unstructured data in the form of call logs apart from emails, call transcriptions, SMS, chattranscripts etc. Agents are expected to summarize an interaction as soon as they are done with it and before picking up the next one. As the agents work under immense time pressure hence the summary logs are very poorly written and sometimes even difficult for human interpretation. Analysis of such call logs are important to identify problem areas, agent performance, evolving problems etc. In this chapter we will be focussing on automatic classification of noisy text. Automatic text classification refers to segregating documents into different topics depending on content. For example, categorizing customer emails according to topics such as billing problem, address change, product enquiry etc. It has important applications in the field of email categorization, building and maintaining web directories e.g. DMoz, spam filter, automatic call and email routing in contact center, pornographic material filter and so on.
Full Text Chapter Download: US $37.50 Add to Cart
More Results
Enhancing Automatic Speech Recognition and Speech Translation Using Google Translate
With the use of automated speech recognition (ASR), users of information systems can enter data by speaking it rather than typing numbers into a keypad. The main purposes of ASR are informational purposes and call forwarding.
Full Text Chapter Download: US $37.50 Add to Cart
Analytics for Noisy Unstructured Text Data II
Machine recognition and conversion of spoken words into text.
Full Text Chapter Download: US $37.50 Add to Cart
Using Machine Learning to Extract Insights From Consumer Data
A field of computer science and class of methods which enables the recognition and translation of spoken language into written text that is processed by computer systems.
Full Text Chapter Download: US $37.50 Add to Cart
A Rural Healthcare Mobile App: Urdu Voice-Enabled Mobile App for Disease Diagnosis
A technology that allows a human being to converse with a computer in order to give some commands or generate a written transcript.
Full Text Chapter Download: US $37.50 Add to Cart
Embodied Conversation: A Personalized Conversational HCI Interface for Ambient Intelligence
An approach of recognizing an uttered speech signal into a computer readable text.
Full Text Chapter Download: US $37.50 Add to Cart
Error Types in Natural Language Processing in Inflectional Languages
The automatic process of a machine to recognize spoken words from an audio signal or recording.
Full Text Chapter Download: US $37.50 Add to Cart
eContent Pro Discount Banner
InfoSci OnDemandECP Editorial ServicesAGOSR