Article Preview
TopIntroduction
In any criminal investigation, the re-creation of the order in which events took place leading to a crime is capital. Capturing that order is difficult due to several problems, the main ones being: (1) information about the events is distributed across different sources, and (2) the often heterogeneous nature of the information. For example, information can be video, such as from CCTV cameras, textual information from emails, photographs on social network sites, and structured data from logs and databases. These data have to be merged so that meaningful conclusions can be drawn about the crime.
In this paper, we consider only textual data, coming from different sources. Such a situation arises when individual reports about a crime are gathered, for example, from different eyewitnesses. Individual accounts of events leading to a crime can differ in a number of ways. Firstly, people naturally use different vocabulary. They may also use different tenses when describing events. Secondly, people may be privy to only a subset of the events associated with the crime and, hence, may only provide accounts on the parts that they have witnessed. Thirdly, interested parties may intentionally provide incorrect information to mislead the investigation. Indeed groups of individuals may collude to confound the investigation. Eyewitnesses may also be misled by investigators. Fourthly, individuals may provide different levels of detail about the observed events. All these facts make the automated extraction of events, and the subsequent matching of events across different textual documents and the identification of actors associated with these events a difficult problem. Thus, a methodology is needed that will, first, discover and then order the events of interest.
Event Discovery
To illustrate some of the difficulty in discovering events, we consider some examples obtained from the mock eyewitness task that is used to validate the method presented in this paper: Three simple events that were observed are (1) the robber flicking through a photo album, (2) shutting the door to the garden and (3) taking a coke from the fridge. These common events have been differently described by our subjects, as shown below:
• Event: Flicking through a photo album
- ◦
“watched TV and flicked through a photo album”
- ◦
“opens the TV and watches some pictures in a photo album”
- ◦
“switches the TV on and flicks through photo albums whilst sitting on a sofa”
• Event: Shuts the door to the garden
- ◦
“he shuts and locks the back doors of the kitchen”
- ◦
“he then closes and locks the door to the garden”
- ◦
“fixed the light, after this he closed the back door”
• Event: Taking a coke from the fridge
- ◦
“goes to the kitchen where he drinks a Coca-Cola (he takes it from the fridge)”
- ◦
“After that, he took a coke from the fridge and looked into the kitchen cabinets”
- ◦
“got a can of coke from the fridge before taking an apple and eating it.”
- ◦
“helped himself to some coke in the fridge”
It is the objective of the paper to develop an algorithm that would identify these ten descriptions as three distinct real-world events, and to infer temporal ordering relationship between these three events. The above captures the inherent difficulty of determining relevant events from a set of textual data sources due to (1) the disparity in the description of common events, (2) the level of details for each event, and (3) the context each event is set against.