An Anonymous Email Identification Solution based on Writing Structural Patterns

An Anonymous Email Identification Solution based on Writing Structural Patterns

Yanhua Liu (College of Mathematics and Computer Science, Fuzhou University, Fuzhou, China and Fujian Provincial Key Laboratory of Network Computing and Intelligent Information Processing, Fuzhou, China), Guolong Chen (College of Mathematics and Computer Science, Fuzhou University, Fuzhou, China and Fujian Provincial Key Laboratory of Network Computing and Intelligent Information Processing, Fuzhou, China) and Yiyun Zhang (College of Mathematics and Computer Science, Fuzhou University, Fuzhou, China and Fujian Provincial Key Laboratory of Network Computing and Intelligent Information Processing, Fuzhou, China)
Copyright: © 2015 |Pages: 13
DOI: 10.4018/IJGHPC.2015040103
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

A method to analyze anonymous emails in digital forensics is presented in this paper. The frequent pattern-growth algorithm is used in the proposed method to analyze an email and obtain the structural email writing pattern of the user. The influence of a user's writing structural pattern on the analysis of an anonymous email varies. The analytic hierarchy process is used to calculate the weight of a user's different writing structural patterns. For a given anonymous email, matching the writing structural pattern and weight calculation can help investigators improve their decision making and determine the author of an anonymous email in forensic work.
Article Preview

Numerous studies have focused on authorship attribution based on writing features, and many authorship identification problems on literary works have been solved successfully in these studies (Mehri et al., 2012).

For MS Office documents, Fu (2011) proposed a forensic method based on the unique value of the revision identifier (RI) to identify the source of suspicious electronic documents in forensic investigation. According to RI and other properties of document, investigators can determine whether a suspicious document and another document are from the same source. Jacques Savoy (2013) described the authorship attribution problem in the context. In the research, he proposed a method based on LDA. Based on more terms, the LDA-based scheme can perform better in some ways than the KLD model and the naïve Bayes model. Liu et al. (2013) developed a Semi-RS algorithm for writeprint identification. Semi-RS takes into account the distribution of individual-author writeprint in feature space. Experimental results show that Semi-RS outperforms conventional random subspace methods with a high accuracy rate. To detect fraudulent emails, Sarwat et al. (2014) extracted various kinds of features from the content of the emails and compared the performance of each category of features with the others in terms of the fraudulent email detection rate and achieved the accuracy as high as 96%.

It should be noted that there is more difference between the identification of email authorship and the identification of spam. The identification of spam is usually dependent on the existing spam database. But different email messages from the same email may be focused on different topic, and it is difficult to count the most commonly used word sets.

Compared with common MS Office documents and literary works, etc., email writing is not bound by spelling and grammar, and involves various writing structures. Thus, email writing features differ from those common documents. More features exist, and extracting and analyzing the email writing pattern of an author is more difficult.

O. de Vel developed the initial research on email authorship (2001). He attempted to identify the author of an anonymous email by analyzing the language and structural features of emails. Text classification is the common method to implement the analysis of anonymous emails (Abbasi et al., 2006). One way to implement this method is by sorting the information in the email content. Another way is by sorting the writing features of the emails (Nizamani et al., 2013). The second way is widely used to determine spam.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing