Crime Profiling System

Crime Profiling System

Hussein Zedan (Applied Science University, Bahrain) and Meshrif Alruily (De Montfort University, UK)
DOI: 10.4018/978-1-4666-6583-5.ch007
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Digital forensics aims to examine a wide range of digital media in a “forensically” sound manner. This can be used either to uncover rationale for a committed crime and possible suspects, prevent a crime from taken place or to identify a threat so that it can be dealt with. The latter is firmly rooted within the domain of intelligence counter measures. The authors call the outcome of the analyses subject profiling where a subject can be a threat or a suspect. In this Chapter the authors outline a process for profiling based on Self-organizing Map (SOM) and evaluating our technique by profiling crimes using a multi-lingual corpus. The development and application of a Crime Profiling System (CPS) is also presented. The system is able to extract meaningful information (type of crime, location and nationality), from Arabic language crime news reports. The system has two unique attributes; firstly, information extraction depends on local grammar, and secondly, automatic generation of dictionaries. It is shown that the CPS improves the quality of the data through reduction where only meaningful information is retained. Moreover, when clustering, using Self Organizing Map (SOM), we gain efficiency as the data is cleansed by removing noise. The proposed system is validated through experiments using a corpus collated from different sources; Precision, Recall and F-measure are used to evaluate the performance of the proposed information extraction approach. Also, comparisons are conducted with other systems.
Chapter Preview
Top

Background

With the advent of the Internet, the volumes of data in electronic form have become huge. As a result, a great deal of the information published on the Internet remains effectively hidden within large bodies of unstructured data.

The challenge that needs addressing is how to find specific knowledge about a particular subject with a high degree of accuracy. According to Noklestad (Noklestad, 2009), there are three different strategies: information retrieval, information extraction and question answering.

According to Fan et al. (Fan et al., 2006), the basic process in analyzing textual data is information extraction, and it is particularly useful when dealing with vast volumes of text. Extracting specific types of information from particular domains began in the late 1980s. The Defense Advance Research Project

Agency (DARPA) initiated a series of “Message Understanding Conferences” (MUC). DARPA's MUC described information extraction as “a task involving the extraction of specific, well-defined types of information from natural language texts in restricted domains, with the specific objective of filling pre-defined template slots and databases” (Ruch et al, 2005). In MUC-1 (1987) and MUC-2 (1989), messages about naval operations were studied. MUC-32 (1991) and MUC-4 (1992) focused on news reports about terrorist events, and MUC-5 (1993) and MUC-6 (1995) investigated news articles about joint ventures and management changes (Witten, 2004). In MUC-7 (1997), the task was to fill templates by identifying missile and rocket launch events from news articles published by the New York Times (Witten, 2004; Borthwick, 1999; Cowie and Lehnert, 1996). For example, in Figure 1, the task was to extract where the rocket was launched, the rocket's owner, the owner of the payload, and the date.

Figure 1.

The outcome of information extraction for filling template slots

Key Terms in this Chapter

Self-Organizing Map: A well-known and classic clustering technique.

Profiling: Techniques to formulate to build a knowledge about a suspect. The outcome of which is a profile containing evidences, analyses, and argumentation.

Information Extraction: The identification of information from a set of data coupled with knowledge.

Pattern Recognition: Techniques to identify a given pattern from a heterogamous corpus. This pattern could be in any media.

Clustering: A mechanism to classify data, information, and/or knowledge according to various similarity measures.

Syntactic Analysis: A term used in a linguistic domain without the aid of grammar. Within the domain of computer languages, syntactic checks are done at a compilation stage.

Complete Chapter List

Search this Book:
Reset