Reading Numbers System for Portuguese Language

Reading Numbers System for Portuguese Language

João Paulo Teixeira (Polytechnic Institute of Bragança, Bragança, Portugal), Carolina Mota (Polytechnic Institute of Bragança, Bragança, Portugal) and Cátia Sampaio (Polytechnic Institute of Bragança, Bragança, Portugal)
Copyright: © 2015 |Pages: 14
DOI: 10.4018/IJRQEH.2015010102

Abstract

The paper presents an algorithm for read common numbers until one million in Portuguese language. The record and cutting process of the digit speech sounds deserved a special attention to improve the speech sound output. A special attention is required for the correct inclusion of the particle ‘e' (and) to provide better naturalness of the read numbers. The system has the ability of simulate the human biologic speech sound production in the task of reading numbers. The system is based in the concatenation of carefully recorded edited and selected speech segments corresponding to the digits. The naturalness of the system was improved with the use of speech files of read digits in different positions (beginning, middle and end) and using digits concatenated with the particle ‘e' before, after and before and after the digit.
Article Preview

Introduction

The process of automatically read numbers is useful for several types of applications. Watches for visually impaired persons, automatic answers systems, speech interface systems with pre-recorded sentences using numbers and even general purpose TTS (Text-To-Speech) systems (Saraswathi, 2010) and (Sproat, 1997) need at least the algorithm for automatic reading numbers. The organization of the sequence of chunks and the correct insertions of linkers such as ‘e’ (and) require algorithms dependent on the language. Additionally, the position of each digit in the whole number carries its own prosody contour concerning their F0 curve and duration length.

Numbers such as amounts, telephone numbers, date, hours, codes and personal identification card requires different structures of reading. For amounts the sequence of numbers should be converted in hundreds of millions, hundreds of thousands and hundreds of units. However, a telephone number is read in a different way, for instance as the groups of units (example 931720855: nine three one - seven two zero - eight five five). The number of digits in each group depends on the length of the telephone number but always groups of three or two digits. The date numbers have a proper form to be written and read. Among several formats of the date one very common appear such as ‘dd-mm-yyyy’ that requires a system to interpret a date in the sequence of numbers and the corresponding structure of reading. Also the reading can be done in different ways. For instance the month can be read as a number or as the name of the month. For the hours the most common format is ‘hh:mm’ but several variation can be found. The way the hour can be read is very variable. For instance ‘18:50 h’ can be read as ‘eighteen fifty’; ‘six hours and fifty minute PM’ or ‘ten to six PM’, among others. For codes and personal identification different forms can be adopted depending on the number of digits. In these cases a similar strategy as the one mentioned for telephone numbers can be adopted.

An automatic reading system for general application should identify the correct class in order to read them in the correct structure. Then the system has to compose the sequence of words to complete the final sentence. Finally the system has to convert the sentence to sound by a synthesis process or concatenation of the sequence of phonemes or words. Depending on the application different strategies can be used to read the number. A general purpose Text-to-Speech could synthesize the sequence of phonemes that fulfill the complete number, but a reading number system can simple concatenate the sequence of recorded sounds of digits and particles. This last process, although less flexible than a TTS synthesizer, can reach better quality because no segmental processing is required. Anyway, the system can be improved considering several requirements during the record and cut of the speech signal, and also using some post prosodic processing. The systems based in the concatenation of recorded sounds of digits the position of the digit within the whole number must be considered. Two different approaches can be used to convey the prosody adequate to the digit position. The first approach consists in the utilization of the recorded digit in the same position. This approach imposes that several records of the same digit must be saved in the database of speech sounds. The second approach consists in making prosody modifications in the original speech sound files to impose the adequate F0 and duration for the corresponding position. TD-PSOLA algorithms (Charpentier & Moulines, 1990) allow the F0 and durations modifications within some limitations. Namely, it is not recommended to change the F0 and/or durations for 2 times higher or lower the original F0 and/or duration, due the severe lost in speech quality (Teixeira, 2012)(Barros, 2002).

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech (Klatt, 1987) (Sharman, 1998) (Teixeira, 2013).

Synthesized speech can be created by concatenating segments of recorded/synthesized speech stored in a database of sound or a database of parameters depending on the acoustic processing module (Sproat, 1997).

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 8: 4 Issues (2019): Forthcoming, Available for Pre-Order
Volume 7: 4 Issues (2018)
Volume 6: 4 Issues (2017)
Volume 5: 4 Issues (2016)
Volume 4: 4 Issues (2015)
Volume 3: 4 Issues (2014)
Volume 2: 4 Issues (2013)
Volume 1: 4 Issues (2012)
View Complete Journal Contents Listing