A Likelihood Ratio-Based Forensic Text Comparison in SMS Messages: A Fused System with Lexical Features and N-Grams

A Likelihood Ratio-Based Forensic Text Comparison in SMS Messages: A Fused System with Lexical Features and N-Grams

Shunichi Ishihara
Copyright: © 2014 |Pages: 17
DOI: 10.4018/978-1-4666-4856-2.ch010
(Individual Chapters)
No Current Special Offers


This chapter is built on two studies: Ishihara (2011) “A Forensic Authorship Classification in SMS Messages: A Likelihood Ratio-Based Approach Using N-Grams” and Ishihara (2012) “A Forensic Text Comparison in SMS Messages: A Likelihood Ratio Approach with Lexical Features.” They are two of the first Likelihood Ratio (LR)-based forensic text comparison studies in forensic authorship analysis. The author attribution was modelled using N-grams in the former, whereas it was modelled using so-called lexical features in the latter. In the current study, the LRs obtained from these separate experiments are fused using a logistic regression fusion technique, and the author reports how much improvement in performance the fusion brings to the LR-based forensic text comparison system. The performance of the fused system is assessed based on the magnitude of the fused LRs using the log-likelihood-ratio cost (Cllr). The strength of the fused LRs is graphically presented in Tippett plots and compared with those of the original LRs. The chapter demonstrates that the fused system outperforms the original systems.
Chapter Preview

2. General Introduction

Due to a continuous increase in the use of mobile phones, the short message service (SMS) is more and more becoming a common medium of communication. Unfortunately, its convenience, low cost and high visual anonymity can be exploited, with SMS messages sometimes used in, for example, communication between drug dealers and buyers, or illicit acts such as, extortion, fraud, scams, hoaxes, false reports of terrorist threats, and many more. SMS messages have been reportedly used as evidence in some legal cases (Cellular-news 2006, Grant 2007, 2010), and it is not difficult to predict that the use of SMS messages as evidence will increase.

That being said, there is a large amount of research on forensic authorship analysis in other electronically-generated texts, such as emails (De Vel et al. 2001, Iqbal et al. 2008), whereas forensic authorship analysis studies specifically focusing on SMS messages are conspicuously sparse (Grant 2010, Mohan et al. 2010).

The forensic sciences are experiencing a paradigm shift in the evaluation and presentation of evidence (Saks and Koehler 2005). This paradigm shift has already happened in forensic DNA comparison. Saks and Koehler (2005) fervently suggest that other forensic comparison sciences should follow forensic DNA comparison, which adopts the LR framework for the evaluation of evidence. The use of the LR framework has been advocated in the main textbooks on the evaluation of forensic evidence (e.g. Robertson and Vignaux 1995) and by forensic statisticians (e.g. Aitken and Stoney 1991, Aitken and Taroni 2004). (e.g. Aitken and Stoney 1991, Aitken and Taroni 2004). However, despite the fact that the LR framework has started making inroads in other fields of forensic comparison sciences, such as forensic voice comparison (Morrison 2009) – which is perhaps the closest to our field – we are somewhat behind in this trend in forensic authorship analysis.

Complete Chapter List

Search this Book: