Probabilistic Evaluation of SMS Messages as Forensic Evidence: Likelihood Ratio Based Approach with Lexical Features

Probabilistic Evaluation of SMS Messages as Forensic Evidence: Likelihood Ratio Based Approach with Lexical Features

Shunichi Ishihara (Australian National University, Australia)
DOI: 10.4018/978-1-4666-4006-1.ch010
OnDemand PDF Download:
List Price: $37.50


This study is one of the first likelihood ratio-based forensic text comparison studies in forensic authorship analysis. The likelihood-ratio-based evaluation of scientific evidence has started being adopted in many disciplines of forensic evidence comparison sciences, such as DNA, handwriting, fingerprints, footwear, voice recording, etc., and it is largely accepted that this is the way to ensure the maximum accountability and transparency of the process. Due to its convenience and low cost, short message service (SMS) has been a very popular medium of communication for quite some time. Unfortunately, however, SMS messages are sometimes used for reprehensible purposes, e.g., communication between drug dealers and buyers, or in illicit acts such as extortion, fraud, scams, hoaxes, and false reports of terrorist threats. In this study, the author performs a likelihood-ratio-based forensic text comparison of SMS messages focusing on lexical features. The likelihood ratios (LRs) are calculated in Aitken and Lucy’s (2004) multivariate kernel density procedure, and are calibrated. The validity of the system is assessed based on the magnitude of the LRs using the log-likelihood-ratio cost (Cllr). The strength of the derived LRs is graphically presented in Tippett plots. The results of the current study are compared with those of previous studies.
Chapter Preview


Due to a continuous increase in the use of mobile phones, the short message service (SMS) is more and more becoming a common medium of communication. Unfortunately, its convenience, low cost and high visual anonymity can be exploited, with SMS messages sometimes used in, for example, communication between drug dealers and buyers, or illicit acts such as, extortion, fraud, scams, hoaxes, false reports of terrorist threats, and many more. SMS messages have been reportedly used as evidence in some legal cases (Cellular-news, 2006; Grant, 2007), and it is not difficult to predict that the use of SMS messages as evidence will increase. The development of the SMS Management and Information Retrieval Kit (Baggili, Mohan, & Rogers, 2010) highlights the importance of SMS messages for crime investigation and as evidence.

That being said, there is a large amount of research on forensic authorship analysis in other electronically-generated texts, such as emails (De Vel, Anderson, Corney, & Mohay, 2001; Iqbal, Hadjidj, Fung, & Debbabi, 2008), whereas forensic authorship analysis studies specifically focusing on SMS messages are conspicuously sparse (cf. Ishihara, 2011; Mohan, Baggili, & Rogers, 2010).

The forensic sciences are experiencing a paradigm shift in the evaluation and presentation of evidence (Saks & Koehler, 2005). This paradigm shift has already happened in forensic DNA comparison. Saks and Koehler (2005) fervently suggest that other forensic comparison sciences should follow forensic DNA comparison, which adopts the likelihood-ratio framework for the evaluation of evidence. The use of the likelihood-ratio framework has been advocated in the main textbooks on the evaluation of forensic evidence (e.g., Robertson & Vignaux, 1995) and by forensic statisticians (e.g., Aitken & Stoney, 1991; Aitken & Taroni, 2004). However, despite the fact that the likelihood-ratio framework has started making inroads in other fields of forensic comparison sciences, such as fingerprint (Choi, Nagar, & Jain, 2011; Neumann et al., 2007), handwriting (Bozza, Taroni, Marquis, & Schmittbuhl, 2008; Marquis, Bozza, Schmittbuhl, & Taroni, 2011) and voice (Morrison, 2009), we are somewhat behind in this trend in forensic authorship analysis.

Thus, emulating forensic DNA comparison, the current study is a forensic comparison of SMS messages using the likelihood-ratio framework. Focusing on the lexical features of SMS messages, we test a forensic text comparison system. The validity of the system is assessed using the log-likelihood-ratio-cost function (Cllr) which was originally developed for use in automatic speaker recognition systems (Brümmer & du Preez, 2006), and subsequently adopted in forensic voice comparison (Morrison, 2011). The strength of likelihood ratios (= strength of evidence) obtained from SMS messages is graphically presented using Tippett plots.

Complete Chapter List

Search this Book: