Analysis of any engineered object can be broken down into two categories – black box and white box. In white box analysis, the analyzer has full knowledge of the inner workings of the system while in black box analysis, the analyzer is only aware of the inputs and outputs of the system. Applying this to software, in white box analysis, the analyzer examines the program’s source code while in black box analysis, the analyzer observes how the program executes under different conditions. White box analysis of software requires source code to be available, therefore when source code is not available the only option for program analysis is black box analysis – that is dynamic analysis without use of source code.
Good analysis of software should have both the properties of soundness and completeness. Soundness means that the analysis of the program reflects its real behaviors. Completeness means that the analysis covers all behaviors of the program. Dynamic analysis of software produces sound results as the program is analyzed during runtime. The completeness of dynamic analysis, however, is not always reliable as the program may not be tested with all possible input conditions and certain branches of the code may not be executed at all. Static analysis of programs can achieve 100% completeness as every line of source code can be analyzed. Although, in theory, static analysis appears to always outperform dynamic analysis, the reality of the current state-of-the-art is that static analysis may fail to determine the exact behavior of a program at runtime as the program is never executed in its real environment. The static analyzer makes certain assumptions about the execution environment which may be incorrect. For example, a static code analyzer may give false confidence that a Java program will never perform a certain behavior when in reality the Java Virtual Machine may contain an exploitable bug which would allow for this undesired behavior to occur. Dynamic analysis could potentially detect this vulnerability while static analysis may simply not consider it in its idealized model of the Java Virtual Machine. Examples such as this one demonstrate that although static code analysis may detect many types of vulnerabilities, and possibly many more in the future, dynamic analysis still currently plays an important role today in detecting vulnerabilities which are undetectable to modern static code analyzers.
1.1. Dynamic Analysis Techniques
Due to the fact that the program must be executed in order to conduct dynamic analysis, certain techniques must be performed so that information from the running process may be fed back to the analyzer. These techniques may be categorized as the following; hooking, dynamic binary instrumentation, virtualization, application level emulation, and whole system emulation (Cesare & Xiang, 2012).
Hooking refers to intercepting function calls, messages, or events between software components (Wikipedia, 2018). Simple dynamic analysis can be done without any special dynamic analysis software packages by changing the LD_PRELOAD environment variable in a Unix system to alter a program’s functionality. A simple example is replacing standard C functions with ones that log debugging information (Izik, 2005).
Dynamic binary instrumentation refers to systems that are used to inject or modify arbitrary instructions in existing binary applications (Hazelwood, Lueck & Cohn, 2009). IntelPIN is an example of such a system as it allows a developer to insert arbitrary code in arbitrary places in an executable (Luk et. al., 2005).
Virtualization refers to creating an execution environment which resembles the environment which the program was designed for without using software to emulate the complete hardware stack (Kay, 2009). Valgrind uses a virtual machine called VEX to intercept system calls and memory accesses required for the analysis that it performs (Floyd, 2012).
Application level emulation refers to using software to emulate an environment for a program to execute in without using software to emulate the complete system which the program executes on. For example, QEMU provides user mode emulation which allows a binary compiled for a different architecture to be executed on the host operating system (Wiki, 2013).
Whole system emulation refers to using software to emulate the complete hardware/software stack for any program. For example, QEMU provides system mode emulation which can emulate a wide variety of computer architectures and their common peripherals (Wiki, 2015).