Email Classification for Forensic Analysis by Information Gain Technique

Email Classification for Forensic Analysis by Information Gain Technique

Dhai Eddine Salhi, Abdelkamel Tari, Mohand Tahar Kechadi
DOI: 10.4018/IJSSCI.2021100103
(Individual Articles)
No Current Special Offers


One of the most interesting fields nowadays is forensics. This field is based on the works of scientists who study evidence to help the police solve crimes. In the domain of computer science, the crimes within computer forensics are usually network attacks, and most attacks are over the email (the case of this study). Email has become a daily means of communication which is mainly accessible via internet. People receive thousands of emails in their inboxes and mail servers (in which people can find emails in those lists). The aim of this study is to secure email users by building an automatic checking and detecting system on servers to filter the bad emails from the good ones. In this paper, the authors will do a study based on a new method of emails clustering to extract the bad and good ones. The authors use the gain information technique like an algorithm of clustering, whose principle is to calculate the importance of each attribute (in this study, the authors talk about the attributes that constitute the email) to draw the importance tree and at the end extract the clusters.
Article Preview

2. Background

2.1. Emails

Email was invented by Ray Tomlinson in 1972 (Mullet, 2000). The principle of email usage is relatively simple; this is what has quickly made the main service widely used on the Internet. In the way of conventional postal service, you only know the address of the sender to send a message (Mullet, 2000). Its two main advantages of emails are: a speed of transmission and the reduced cost (overall cost of Internet connection) (Sellers, 1980).

2.1.1. Email structure

In 1982, a standard way for email provided by the Internet was defined. It is based on certain conventions adopted by Ray Tomlinson (Hazel, 2001), but they are updated to show the modern state of the Internet. The SMTP (Simple Mail Transfer Protocol) and POP3 (Post Office Protocol 3) are the underlying technologies behind the transfer of email from a web host to another. In the early days of the Internet, electronic mail systems use special protocols to transfer email from a proprietary system to another. Hence, an email is composed of two parts (McDonald, 2009): the first is the header and the second is the message body.

2.1.2. MIME Standard

MIME (Multipurpose Internet Mail Extensions) is a standard that was proposed by Bell Laboratories in 1991 to extend the limited possibilities of email and in particular to allow the insertion of documents (images, sounds, text, etc.) in email. It was originally defined by RFC 1341 and 1342 from June 1992 (Smith, 2006). MIME messaging has the following features:

  • Ability to have multiple objects (attachments) in a single message;

  • An unlimited message length;

  • The use of character sets (alphabets) other than ASCII;

  • The use of rich text (formatting of the messages, fonts, colors, etc.

  • The binary attachments (executable, images, audio or video files, etc.), possibly with several parts.

Complete Article List

Search this Journal:
Volume 16: 1 Issue (2024)
Volume 15: 1 Issue (2023)
Volume 14: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 13: 4 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing