Creating Risk-Scores in Very Imbalanced Datasets: Predicting Extremely Violent Crime among Criminal Offenders Following Release from Prison

Creating Risk-Scores in Very Imbalanced Datasets: Predicting Extremely Violent Crime among Criminal Offenders Following Release from Prison

Markus Breitenbach (Northpointe Institute for Public Management, USA), William Dieterich (Northpointe Institute for Public Management, USA), Tim Brennan (Northpointe Institute for Public Management, USA) and Adrian Fan (University of Colorado at Boulder, USA)
DOI: 10.4018/978-1-60566-754-6.ch015
OnDemand PDF Download:
No Current Special Offers


In this chapter, the authors explore Area under Curve (AUC) as an error-metric suitable for imbalanced data, as well as survey methods of optimizing this metric directly. We also address the issue of cut-point thresholds for practical decision-making. The techniques will be illustrated by a study that examines predictive rule development and validation procedures for establishing risk levels for violent felony crimes committed when criminal offenders are released from prison in the USA. The “violent felony” category was selected as the key outcome since these crimes are a major public safety concern, have a low base-rate (around 7%), and represent the most extreme forms of violence. The authors compare the performance of different algorithms on the dataset and validate using survival analysis whether the risk scores produced by these techniques are computing reasonable estimates of the true risk.
Chapter Preview

Introduction And Background

In this chapter, we will discuss the many benefits of the Area under the Curve metric (AUC), not only as a performance measure, but also as a tool for optimizing models on very imbalanced datasets. We first introduce the measure formally and then discuss a few modeling techniques that can be used specifically for imbalanced datasets. Then we will include techniques that optimize the AUC directly. We will discuss how to choose suitable cut-points on an AUC optimized score and present a case study on predicting violent felony offenses (VFO) on a parole population.


The use of predictive modeling has become pervasive as a decision support tool in criminal justice organizations in the USA. Even before the current shift to technical methods, prediction and classification tasks were central in criminal justice decision-making. Until the late 1970’s, most risk estimations regarding criminal offenders were made in an informal manner largely relying on the subjective or “expert” judgment of judges, clinical psychologists, parole boards etc. These decision-makers usually had access to substantial data on criminal histories and other relevant social and psychological data; however, they generally produced their decisions intuitively without the aid of any predictive model.

The last two decades have seen a dramatic shift by most national and state criminal justice agencies to incorporate more reliable, objective, data-driven, and formal predictive models. The motivation for this shift included: the desire for higher predictive accuracy, for more reliable and defensible procedures that could be justified and replicated, and, at the policy level, a desire to appropriately balance public safety with the competing goals of equity and protecting the rights of prisoners (Gottfredson, 1987; Brennan, 1987). A further motivation was the consistent finding in research studies that statistical and numerical models could systematically outperform the accuracy of human “expert” decision-makers, e.g. judges, prosecutors, trained prison classification officers, and so on (Quinsey, Harris, Rice, & Cormier, 1998b; Grove & Meehl, 1996).

The focus on criminal violence is critical since public safety is among the major goals of correctional agencies. Additionally, the scope of the task of estimating the risk of criminal violence is enormous. In recent years, state prisons in the USA admitted over 600,000 new inmates each year, and almost the same number were released each year from secure facilities. Thus, approximately 1,600 released prisoners each day were arriving back to communities across the country (Petersilia, 2001). Making estimations of the risk of criminal and violent behavior is thus a continual challenge for correctional/forensic professional staff. Risk estimations are also needed at several decision points that may involve different decision-makers across criminal justice. For example, probation officers must estimate the risk of future violence when preparing pre-sentence reports for judges. Judges, in turn, face similar predictive questions in struggling with sentencing decisions, i.e., is the expected risk so high that an offender should be locked up as an incapacitative or public safety strategy. Thus, the sheer number and the demand for timely decisions can overwhelm parole boards that must make and then justify such decisions. The performance and efficiency of numerical decision support risk estimations is thus of considerable value and importance to criminal justice agencies.

Complete Chapter List

Search this Book: