Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Malay Language Text-Based Anti-Spam System Using Neural Network

Hamid A. Jalab, Thamarai Subramaniam, Alaa Y. Taqa

Source Title: Handbook of Research on Threat Detection and Countermeasures in Network Security

DOI: 10.4018/978-1-4666-6583-5.ch013

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

This unauthorized intrusion has cost time and money for businesses and users. The exponential growth of spam emails in recent years has resulted in the necessity for more accurate and efficient spam filtering. This chapter focuses on creating a text-based anti-spam system using back-propagation neural network for Malay Language emails that efficiently and effectively counter measure spam problems. The proposed algorithm consists of three stages; pre-processing, implementation and evaluation. Malay language emails are collected and divided into spam and non-spam. Features are extracted and document frequency as dimension reduction technique is calculated too. Classifiers are trained to recognize spam and non-spam emails using training datasets. After training, classifiers are tested to check whether they can predict spam (or non-spam) emails accurately with the testing datasets. The result of this classification in terms of accuracy, precision, and recall are evaluated, compared and analyzed, thus providing the best anti-spam solution to counter measure spam problem of Malay language emails.

Chapter Preview

Top

Introduction

Email is one of the most popular communication tools that were ever invented. It has proliferated internet usage since it was introduced, and allows users to communicate with each other at low cost while providing an efficient message delivery system. However, the simplicity and low cost in sending an email has paved the way for unsolicited emails. Individual users and businesses can send thousands of emails to recipients at any given time. These emails, also known as spam, are unsolicited emails, which are neither requested nor required by the recipients. Spam either contains harmless marketing information or malicious codes such as viruses that could cause data loss, thus leading to inconvenience and/or economic loss to the recipients. Unsolicited emails are widely viewed as a serious threat to the internet as it clogs up the users’ inboxes and cost businesses billions of dollars in wasted bandwidth (Cournane & Hunt, 2004).To combat spam, researchers and developers have created many anti-spam tools. The basic function of anti-spam tools is to filter emails by separating spam from genuine mail and adding them into a junk mail box. Various methods and standards are used to fight spam nuisances (Subramaniam, Jalab & et al, 2010).

There were many studies carried out on spam filtering that were effective and efficient on detecting and blocking spam email. However, these studies were mainly performed on English language email spam. Methods (preprocessing and Machine learning algorithms) used for English language spam detection will limit the performance of a classifier given the nature of different human languages (Özgür, Güngör & et al, 2004; Pang, Feng, & et al 2007). (Özgür et al., 2004) proposed dynamic spam filtering methods based on Artificial Neural Network and Bayesian algorithms for agglutinative language and for Turkish in particular (which is a complex morphology).They performed five different experiments by using Single Layer Perceptron (SLP), Multi-Layer Perceptron (MLP) and Bayesian with 3 different feature vector sizes. Their experiments showed that some non-Turkish words that occurred frequently in spam mail were better classified than most Turkish words.

(Dong, Cao & et al, 2006) indicated that segmenting Chinese words (email) restricts the performance of existing spam filter. They used Bayesian spam filter based on cross N-gram on CCERT Computer Emergency Response Term of which 940 were spam emails and 1400 were non-spam Chinese language emails. These emails were then partitioned into 10 parts. 5 characters of crossed N-gram and three different feature selection methods were used: Mutual Information, Odd Ratio and X2 –statistic (CHI). Comparison of all 3 feature selection methods were reported based on and . They concluded that the Odds Ratio selection scheme produced the best result and errors can be further reduced with the combination of rule-based methods.

(Pang & et al., 2007)used Support Vector Machine by adopting the tri-gram language model for word segmentation of Chinese emails and applied Discount Smoothing algorithms to overcome the sparse data problem. Automaton Machine identifies different factoid words. They experimented using LingSpam (English email) and CCERT data sets of Chinese emails, and made comparison between Maximum Entropy, Bayesian, Bayesian with Good-Turning, Bayesian with Absolute Smooth and Support Vector Machine.

(Anh, Anh & et al, 2008) specified that token segmentation of the Bayesian filter produced less effective performances for detecting Vietnamese language-based spam. Therefore, they proposed a Vietnamese segmentation for token selection based on language classification and Bayesian. They implemented two filters; token segmentation based on whitespaces and token selection based on the Vietnamese segmentation approach. The result showed that Vietnamese segmentation token selection coupled with Bayesian classifier generated more effective spam detection - 9% more accurate as compared to other segmentation techniques.

Key Terms in this Chapter

Single Layer Perception (SLP): A feed-forward network based on a threshold transfer function. SLP is the simplest type of artificial neural networks and can only classify linearly separable cases with a binary target (1, 0).

False Positive (FP): A non-spam email which is classified as spam is referred to as False Positive (FP).

Automatically Defined Group (ADG): A rule extraction method used for classification.

False Negative (FN): Spam email that is classified as a non-spam email is referred to as False Negative (FN).

Mean Squared Error (MSE): A measure of performance of a point estimator. It measures the average squared difference between the estimator and the parameter.

Learning Vector Quantization (LVQ): A neural net that combines competitive learning with supervision. It can be used for pattern classification.

Self-Organizing Map (SOM): One of the most popular neural network models. It belongs to the category of competitive learning networks. The Self-Organizing Map is based on unsupervised learning.

Back-Propagation Neural Network (BPNN): Based on the function and structure of human brain or biological neurons. These network of neurons can be trained with a training dataset in which output is compared with desired output and error is propagated back to input until the minimal MSE is achieved.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Malay Language Text-Based Anti-Spam System Using Neural Network

Abstract

Introduction

Key Terms in this Chapter

Complete Chapter List