A New Meta-Heuristic based on Human Renal Function for Detection and Filtering of SPAM

A New Meta-Heuristic based on Human Renal Function for Detection and Filtering of SPAM

Mohamed Amine Boudia (GeCode Laboratory, Department of Computer Science, Tahar Moulay University of Saida Algeria, Saida, Algeria), Reda Mohamed Hamou (GeCoDe Laboratory, Department of Computer Science, Tahar Moulay University of Saida Algeria, Saida, Algeria) and Abdelmalek Amine (GeCoDe Laboratory, Department of Computer Science, Tahar Moulay University of Saida Algeria, Saida, Algeria)
Copyright: © 2015 |Pages: 33
DOI: 10.4018/IJISP.2015100102
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

The e-mail is therefore one of the most used methods for its efficiency and profitability. In the last few years, the undesirables emails (SPAM) are widely spread as they play an important part in the inbox. Consequently, several recent studies have provided evidence of the importance of detection and filtering of SPAM as a major interest for the Internet community. In the present paper, the authors propose and experiment a new and original meta-heuristic based on the renal system for detection and filtering spam. The natural model of the renal system is taken as an inspiration for its purification of blood, the filtering of toxins as well as the regularization of the blood pressure. The messages are represented by both a bag words and N-Gram method which is independent of languages because an email can be received in any language. After that, the authors propose to use two models to apply a Bayesien classification on textual data: Bernoulli or Multinomial model.
Article Preview

1. Introduction And Problematic

The appearance of the Internet and the incredibly rapid development of telecommunication technology have made the world a global village. The Internet has become a major channel for communication. Email is one among the tools for communication that Internet users take advantage of as it is available free of charge and supplies the transfer of files.

According to the most recent report of the Radicati Group (2014), who supplies quantitative and qualitative researches with details on the e-mail, the security, the Instant messaging (IM), the social networks, the archiving of the data, the regulatory compliance, the wireless technologies, the Web’s technologies and the unified communications, there was exactly:

  • 4.116 trillion Of active emails accounts in the world.

  • 2.504 Billion People who use e-mails regularly to over 2.8 billion in 2018.

  • 196,3billion is the number of e-mails that are sent to by day in 2014in the world on average. This number will increase to 227,7 billion in 2018.

  • 1,6 is the number of accounts detained by each person and which should increase to 1,8 in four years.

According to the same reports of the Radicati Group, unsolicited mail, or SPAM, can reach more than 89,1%; 262 million SPAMS a day. In 2009, about 81% of the sent emails were SPAM. Consequently, spamming became a global phenomenon. For the CNIL (the National Commission for Computing and Liberties), ” the “SPAMMING” or” SPAM ” is to send massive and sometimes repeated electronic mail, not requested, to people with whom the sender has had no contact and whose he has captured the email address in an irregular way. “

From the above statistics, the detection and filtering of spam is a major stake to the Internet community making the detection and filtering of spam a crucial task.

The literature gives two broad approaches for the filtering and the detection of SPAM: The approach based on the machine learning and the approach not based on the machine learning. The first approach is based on feature selection which is an important stage in the systems of classification. It aims to reduce the number of features while trying to preserve or improve the performance of the used classifier. On the other hand, the second approach (not based on the machine learning) is based on many existing techniques and algorithms: content analysis, the block-lists, black-lists and white-lists, the authentication of mailbox and the heuristics and finally meta-heuristics.

Even though it is usually easy to decide whether it is a spam / non-spam” by human, we can't tackle SPAMS by manual sorting of email because the number of emails in circulation which we have just quoted is extremely large.

In the human body, an important process for the survival occurs automatically, which is the purification of the blood by the renal system. The human can die if the rate of toxins and unwanted substances found in the blood exceeds some threshold; the renal system purifies and filters the blood in automatic manner and a delicate and precise way. The blood pressure regulation is another role of the renal system.

We propose a new approach inspired from the renal system for the detection and the filtering of the SPAM with a hybridization of both approaches (based and not based on the machine learning). Further, several techniques in the same system of filtering of SPAM are used including: content analysis, the blacklists, the white lists. Another part of our approach controls the flow of the emails which represents one of the roles of the renal system (the blood pressure regulation) to minimize the risk of DDoS attacks (denial of service attack).

Our approach is a combination of different positive properties of filtering techniques at various levels by deploying them in a hybrid approach.

The best filter that can be found in the nature is the human kidney. It is one of our motivations that drove us to inspire from this natural filter. If we can mime this phenomenon, we think that we shall wait for convincing results in the field of the filtering of spam.

2. Materials And Methods

To apply the Naive Bayes algorithm on textual data we must use one this document model:

  • Bernoulli document model: binary vector.

  • Multinomial document model: frequency vector.

    (1)

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 12: 4 Issues (2018): 1 Released, 3 Forthcoming
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing