This chapter provides practitioners in the field with a set of guidelines to help them through the process of elaborating an adequate automated testing framework to competently test automatic speech recognition systems. Through this chapter the testing process of such a system is analyzed from different angles, and different methods and techniques are proposed that are well suited for this task.
Top1. Introduction
Automatic speech processing is a multidisciplinary field dedicated to the analysis of speech signals. One of the main areas of speech processing is automatic speech recognition (ASR), which consists of extracting the linguistic information from a speech signal. ASR systems are extremely complex systems that must integrate a number of sources of information (acoustic, linguistic, speaker, environmental, information and others) and, in like manner, must process a massive amount of features in real time. Furthermore, given that the accuracy and reliability of state-of-the-art ASR systems are still lacking in many respects, new algorithms and techniques that apply to almost every module of these systems that are developed each year by researchers and published in the corresponding scientific literature. Thus, one must recognize that a speech recognition system is a truly complex software system, which implies that it is prone to development errors. Moreover, for it to stay competitive in terms of accuracy and real time performance, it is a system that needs to be constantly evolving in order to incorporate the latest advances in the field. The combination of these two factors makes automated software testing a must in order to guarantee that the developed system meets the high quality standards required for a product of its kind.
Automated software testing techniques provide the software developer with a comprehensive set of tools and techniques that ease the design and automation of all types of software tests (unit, integration, functional, etc.) across the life-cycle of a software system. The development of speech processing systems and, in particular, automated speech recognition systems requires a fairly extensive knowledge of a number of topics, such as signal processing, statistics, search algorithms, machine learning, phonetics, and linguistics. Thus, speech recognition developers are typically scientists who come from many different areas and who do not necessarily possess a strong background in software development and even less so in software engineering or automated software testing techniques. For this reason, a number of the speech recognition systems are made publicly available, a group which includes some of the most widely used systems in the research community, do not meet basic software engineering or software testing principles. This makes it difficult for practitioners to modify or adapt these systems with enough confidence and sometimes hinders or even prevents the incorporation of the latest techniques necessary to keep them up-to-date. Thus, it is sometimes the case that a system is not updated as rapidly as it should be (because there is no one confident enough to do so) and eventually the system becomes outdated. This creates the necessity of building a new system, practically from scratch, instead of updating a perfectly good existing system. What comes into play here is the bad principle of if something works (reasonably well) do not touch it, as opposed to the good principle of software testing: make your changes confidently and run the necessary tests (which include the regression and the newly developed test-cases.) We believe that following an adequate automated software testing methodology can overcome these issues and potentially facilitate a more rapid development of the field by providing researchers with clear guidelines for how to automate the testing process of ASR systems, so they can be updated safely and confidently.