A Framework for Testing Code in Computational Applications

A Framework for Testing Code in Computational Applications

Diane Kelly (Royal Military College, Canada), Daniel Hook (Engineering Seismology Group, Canada) and Rebecca Sanders (EA Pogo, Canada)
DOI: 10.4018/978-1-61350-116-0.ch007

Abstract

The aim of this chapter is to provide guidance on the challenges and approaches to testing computational applications. Testing in our case is focused on code testing for accuracy as opposed to validating the science models or testing user interfaces. A testing framework is used to present the different challenges. Discussions cover topics such as test oracles and the tolerance problem, testing to address specific goals rather than testing as a process, areas of risk inherent in developing and using computational software, a testing mindset, and the use of technical reviews. Three observational studies are included to illustrate different techniques, problems, and approaches. There is no prescribed way of testing computational code. Instead, an awareness of risks and challenges inherent in computational software can provide the necessary guidance.
Chapter Preview
Top

Introduction

Mistakes find their way into all nontrivial pieces of software. This is supported by both our experiences and by published research. For example, Les Hatton (1997) conducted a series of experiments in which he found that some scientific programs thought to be “fully tested” (p. 30) harboured serious code faults.

For scientific software to be trusted, the developers of scientific software must make a reasonable effort to detect and correct the faults in their code. This reality is strongly expressed by Donoho, Maleki, Shahram, Ur Rahman, & Stodden (2009) in an article on reproducible computational research in which they write:

Many scientists accept computation (for example, large-scale simulation) as the third branch [of science—alongside deductive and empirical branches]...However, it does not yet deserve elevation to third-branch status because current computational science practice doesn’t generate routinely verifiable knowledge. Before scientific computation can be accorded the status it aspires to, it must be practiced in a way that accepts the ubiquity of error, and work then to identify and root out error. (pp. 8-9).

Many activities may be involved in the quest to identify and root out errors in artifacts of scientific processes. For example, to help root out errors in deductive science and mathematics the resulting artifacts (for example, equations) are subjected to peer review. Similarly, computational artifacts should be scrutinized. However, just as artifacts of deductive science cannot be reviewed in the same way as artifacts of empirical science (such as physical measurements), reviews of computational artifacts must be carried out in a way uniquely suited to the principal artifact of the computational process, program code. In this chapter we will focus on two approaches to the review of program code: code testing and technical review. Both of these approaches will be grouped under the umbrella term code scrutinization.

Some topics are not addressed in this chapter. Firstly, we do not discuss the validation of the scientific models that underlie scientific programs. Although it is critical that scientific programs be built from appropriate scientific models, it is also critical that models are realized in code reasonably and accurately. Scientists are experts at evaluating scientific models, but they are not necessarily experts at evaluating codes that realize these models. In our research (Sanders and Kelly, 2009) and work experiences, we have found that strong model validation practices are often not matched by strong code scrutinization practices. For that reason, this chapter avoids discussions of model validation and devotes itself to code scrutinization.

Secondly, we do not discuss numerical methods. Selection of numerical methods, solution techniques, and algorithms can have a strong influence on the accuracy of a program, but it is not our aim to instruct the reader on how to choose appropriate algorithms. Numerous introductory and advanced textbooks already offer good coverage of the topic. However, we encourage strong code scrutinization practices to help scientists discover excessive inaccuracies resulting from weak algorithms.

Thirdly, we do not discuss the testing of routines that interact with the world outside the program. Instead, we focus primarily on the testing of computational engines.

A Note on Terminology

In the remainder of this chapter, when we use the word error we mean the quantitative difference between a measured or calculated value of a quantity and what is considered to be its actual value. To indicate a code mistake we will use the world fault. Note, therefore, that a fault is not an error, but a fault can lead to an error.

Top

Description Of A Testing Framework

In general, testing is an investigative activity done to improve knowledge about the state of the software under test. Each test is an experimental trial of the software. Tests contribute empirical data required to answer questions about the software. A testing effort will have knowledge goals that tests should fulfill when taken in aggregate.

Key Terms in this Chapter

Software Testing: an investigative activity done to improve knowledge about the state of the software under test; each test is an experimental trial of the software; tests contribute empirical data required to answer questions about the software.

Error: the quantitative difference between a measured or calculated value of a quantity and what is considered to be its actual value.

Scientific Code: application software whose purpose is to answer a scientific or engineering question; this software embodies significant domain knowledge in order to answer the question; the intended recipient of the software’s answer is a knowledgeable human as opposed to consumption by other software or hardware devices.

Test Tolerance: the delta allowed, either explicitly or implicitly, on all output from scientific software in order to judge its correctness; tolerance includes all possible sources of error in the code execution.

Tolerance Problem: the problem of choosing an appropriate tolerance for the evaluation of test outputs.

Code Scrutinization: the review of program code with the goal of finding code faults; techniques used for this review can include both software testing and technical reviews.

Code Fault: a mistake in the code; any code instruction that is different than what was intended.

Testing Oracle: a means by which the expected answer for a computation can be determined; for software testing, an oracle provides expected answers against which the results from the test execution of the software can be compared.

Technical Review: a static exercise of examining code or other software product; when applied to software code, this exercise complements dynamic examinations such as software testing.

Complete Chapter List

Search this Book:
Reset