On the Evaluation of Early Warning Models for Financial Crises

On the Evaluation of Early Warning Models for Financial Crises

Lucia Alessi (European Central Bank, Germany), Carsten Detken (European Central Bank, Germany) and Silviu Oprică (Goethe-University Frankfurt, Germany)
DOI: 10.4018/978-1-4666-9484-2.ch004
OnDemand PDF Download:
No Current Special Offers


Early Warning Models (EWMs) are back on the policy agenda. In particular, accurate models are increasingly needed for financial stability and macro-prudential policy purposes. However, owing to the alleged poor out-of-sample performance of the first generation of EWMs developed in the 90's, the economic profession remains largely unconvinced about the ability of EWMs to play any important role in the prediction of future financial crises. The authors argue that a lot of progress has been made recently in the literature and that one key factor behind the prevailing skepticism relates to the basic evaluation metrics (e.g. the noise-to-signal ratio) traditionally used to evaluate the predictive performance of EWMs, and in turn to select benchmark models. This chapter provides an overview of methodologies (e.g. the (partial) Area Under the Receiver Operating Characteristic curve and the (standardized) Usefulness measure) better suitable for measuring the goodness of EWMs and constructing optimal monitoring tools.
Chapter Preview


Owing to the global financial crisis, Early Warning Models (EWMs) are back on the policy agenda. In particular, we argue that EWMs for financial crises are increasingly needed to support policy decisions in the macro-prudential field and several key institutions are indeed designing new tools for the early detection of vulnerabilities. This has triggered a renewed interest in academic research on the topic. On the other hand, owing to the alleged poor out-of-sample performance of the first generation of EWMs developed in the 90’s, large parts of the economic profession remain unconvinced about the ability of EWMs to predict financial crises and therefore the usefulness of EWMs to support the policy process.

There are several good reasons why the academic profession has been skeptical about the contribution of EWMs to predict financial crises. Eichengreen (2003) states that EWMs have difficulties in dealing with structural relationships which interact in non-linear and state-contingent ways. E.g. an asset price misalignment should be interpreted differently if the overvalued assets are held by leveraged and interconnected agents and the vulnerability is unlikely to be increasing proportionally with the degree of overvaluation. He further notes that complex systems, like the financial system, often have multiple equilibria, which are sensitive to small perturbations. And finally he reminds us of the circularity of forecasts/warnings, which is also known as Goodhart’s Law.1 The latter states that any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes. If e.g. macro-prudential authorities would use EWMs and act upon the signals in an appropriate way, crises would be prevented and no future econometrician would be able to detect the early warning properties of any indicator or model in a reduced form type of analysis. In summary, Eichengreen highlights that the role of fundamentals in predicting financial crises might be difficult to detect by means of (standard) empirical analysis. Furthermore, the early warning literature suffered in the past from frequent out-of-sample failures, partly based on in-sample over-fitting and variable selection bias. What this means is that often EWMs have been fit to explain a particular crisis or set of similar crises, e.g. the Asian current account crisis in 1997, but the indicators and models were selected after the fact. The models and performance measures were not derived in real time and thus oversold the capabilities of such models to predict future crises ex-ante.2 Policy makers and academic journal referees alike have not been amused by what often appeared like exercises in pure data-mining.3

More recently though, a lot of progress has been made on many fronts. Out-of-sample validation methods are increasingly used to select the best performing models. This means that the estimation and evaluation samples are split in order to obtain realistic performance statistics.4 Objective methods are being used to avoid in-sample variable selection bias and obtain concise model specifications, which perform well out-of-sample.5 Real vintage data are not available in most cases, but pseudo real time exercises are being performed, e.g. by lagging data in a way that at each point in time only data are used, which would have been truly available at the time of the forecast. If trends or deviations from trends (gaps) are being used, they are calculated recursively to comply with the same real time concept.6 EWMs are also now geared towards the use of policy instruments by aiming to predict only those types of crises, which a specific policy instrument is meant to prevent or alleviate. This makes the signals received more useful from the policy maker’s perspective.7 A key element is also that EWMs usually do not predict the outbreak of a crisis, which is nearly impossible to do, as very small random events can trigger a crisis. The trigger might even be unrelated to fundamentals (see Eichengreen, 2003), while the underlying vulnerability is not. EWMs nowadays identify a vulnerable state, which would tend to lead to a crisis within a certain time span, e.g. within the coming 3 years. This gives more relevance to the prediction and would allow policy makers to take relevant action ahead of the crisis.

Key Terms in this Chapter

Partial AUROC: The AUROC measured only under a portion of the ROC curve, which corresponds to a realistic range for the policymaker’s preferences.

Type-2 Error Rate: Fraction of false alarms (same as false positive rate).

Type-1 Error Rate: Fraction of missed crises (same as 1- true positive rate).

Confusion Matrix: A table with two columns referring to the outcomes, e.g. ‘crisis’ and ‘no crisis’, and two rows referring to the predictions, e.g. ‘warning issued’ and ‘no warning issued’. Each observation, e.g. country-period, under evaluation falls in one of the four quadrants of the matrix.

Usefulness: An evaluation measure indicating the gain from an EWM. It is based on the values of the loss function associated with using the model and with ignoring it.

Noise-to-Signal Ratio: The ratio of false signals to good signals. It is defined as the noise ratio (the fraction of false positives over all non-crisis episodes) divided by the signal ratio (the fraction of correctly predicted crises over all crisis episodes).

Policymaker’s Loss Function: In its simplest form, it corresponds to the average of type-1 and type-2 errors, weighted by a parameter linked to the relative preferences of the policymaker between missing crises and issuing false alarms.

ROC Curve: The acronym stands for Receiver Operating Characteristic curve. It illustrates the performance of an EWM by plotting the true positive rate against the false positive rate associated with various warning thresholds.

AUROC: The acronym stands for Area Under the ROC curve (see below). The AUROC measures the predictive performance of a classifier, with a value of 1 being associated with a perfect model and a value of 0.5 being associated with a random model.

Bootstrap: A statistical technique consisting in randomly re-sampling with replacement the observed sample, in order to generate artificial samples and finally derive a distribution for the statistics of interest.

Complete Chapter List

Search this Book: