Enhancing Automatic Speech Recognition and Speech Translation Using Google Translate

Enhancing Automatic Speech Recognition and Speech Translation Using Google Translate

DOI: 10.4018/978-1-6684-8145-5.ch012
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Communication between speakers of many languages is made possible by cutting-edge technologies like spoken language translation. The process of automatically recognizing, translating speech in real life is still a particularly challenging part of spoken language translation. Because it necessitates a fundamental modification of both linguistic and non-linguistic attributes, interpreting spoken words directly from one language to another is difficult. Numerous basic modules, like Google Translate and Text-to-Speech, are capable of achieving this. This speech recognition model not only demonstrates technological proficiency, but also offers a helpful platform for those who have hearing challenges. The majority of hearing-impaired people find it difficult to communicate because they rely on lipreading or other specialized treatments and struggle to understand broad information in this modern era. For those who have hearing impairments, speech recognition software would offer a comprehensive long-term answer because live speech-to-text translations will improve their communication skills.
Chapter Preview
Top

Recent Developments In Speech Recognition

Ximera Model

XIMERA is made up of four main modules, including a text processing module, a prosodic parameter generation module, a selection module, a waveform module, and a development module. Chinese and Japanese are the languages XIMERA is designed to support. The text processing module, speech corpora, acoustic models for parameter generation, and function of cost for segments finding method are examples of language-dependent modules. The cost function of the segment selection finds method is also connected to the target language. At the moment, XIMERA is concentrating on a reading voice pattern appropriate for news reading and emotionless conversations between humans and machines. The following are XIMERA's salient characteristics:

  • 1.

    It's an extensive corpus with nearly 110 hours of the corpus of a Japanese male, 60 hours of the corpus of a Japanese woman, and 20 hours of the corpus of a Chinese woman.

  • 2.

    The production of prosodic parameters using HMM.

  • 3.

    A segment selection cost function that has been improved using perceptual studies.

Key Terms in this Chapter

Machine Translation: This is a process in which humans are not involved and artificial intelligence plays an important role. They help in converting text from one language to another language.

Automatic Speech Recognition: With the use of automated speech recognition (ASR), users of information systems can enter data by speaking it rather than typing numbers into a keypad. The main purposes of ASR are informational purposes and call forwarding.

Google Trans: Google trans, a free and infinite Python module, makes use of Machine Translation (API). This uses the Google Translate Ajax API to call methods like detect and translate.

Gtts: Gtts is a user-friendly tool that will convert the text entered into audio and save it as an mp3 file. The saved mp3 file can be played using two audio speeds fast and slow.

Speech Translation: Conversational spoken sentences are quickly translated and spoken aloud in a second language through the technique of speech translation. The system only translates a fixed and finite set of phrases that have been manually placed into the system in phrase translation, which is different from this.

Hidden Markov's Model: The hidden Markov model is a statistical model that is mainly used for the observation of the evolution of events based on internal factors that are not directly observable. It is the foundation of many other modern-day algorithms.

Natural Language Processing: Natural language processing NLP's major purpose is to allow computer systems to interpret natural human language.

Complete Chapter List

Search this Book:
Reset