Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Detection and Prediction of Spam Emails Using Machine Learning Models

Salma P. Z, Maya Mohan

Source Title: Handbook of Research on Cyber Crime and Information Privacy

DOI: 10.4018/978-1-7998-5728-0.ch011

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

One of today's important means of communication is email. The extensive use of email for communication has led to many problems. Spam emails being the most crucial among them. It is one the major issues in today's internet world. Spam emails contain mostly advertisements and offensive content, which are often sent without the recipient's request and are generally annoying, time consuming, and wasting space on the communication media's resources. It creates inconveniences and financial loss to the recipients. Hence, there is always the need to filter the spam emails and separate them from the legitimate emails. There are a lot of content-based machine learning techniques that have proven to be effective in detecting and filtering spam emails. Due to a large increase in email spamming, the emails are studied and classified as spam or not spam. In this chapter, three machine learning models, Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BLSTM), are used classify the emails as spam and benign.

Chapter Preview

Top

Introduction

Electronic mail or email is one of today's most prominent and fastest methods of exchanging messages between people using digital devices over the internet. Email is one of the easiest and most common means of communication in our time. One of the major communication platforms in today's age is the internet and hence emails are considered to be the fastest and essential way of communication means. Email communication contributes to the major part of conversations in almost all domains. It is one of the most effective and commonly used sources of communication. The cost-effective and speed in communication made email communication popular among the users.

Spam emails, often known as junk email, is the unrequested messages which are sent in bulk through email communication. These unwanted emails are generally annoying and time-consuming and wasting space on the communication media's resources. Spam emails result in many issues such as reduced performance of the email engines, occupation of unnecessary space in the mailbox, and destroying the stability of mail servers. In certain cases, it also contains viruses, trojans, and other materials that may be potentially harmful to certain categories of users (Shrivastava and Anju, 2017). There have been several studies on spam emails and it shows a steady growth of spam emails by 90% from the early 1990s till 2014. Issues related to spam mail has also been growing exponentially over time. Users receive hundreds of spam emails with new content and new sources and these spams are generated by spammers using automatic robot software. The organizations face an influence financially due to the proliferation of spam emails. Spam emails not only invade the user's email but also produces a huge volume of unwanted data thus limiting the network's usage and capacity. Email spamming is also the first step in targeted attacks in organizations, which is currently an important issue. Email spam is not merely an innocuous waste of time (Alurkar et al., 2017). It is a tool for malicious activities such as spear phishing, whaling, clone phishing, website forgery, and much more. Classifying emails as spam or ham is thus of utmost importance from a security perspective for the user. The issues related to spam mails are escalating with the increased usage of the web. The fact that out of 80 billion emails received every day, 48 billion of them being spam highlights the importance and urgency of implementing effective classification procedures for emails (Harisinghaney et al., 2014).

With the increasing network bandwidth and improving technology, spam emails have become more sophisticated and it is necessary to use advanced algorithms to create efficient spam filters. Despite the huge amount of research work that has taken place in this sphere, there is no spam filter that is 100% efficient. Hence, there is a need to develop a more sophisticated and accurate classifier model to eliminate the problem of spam emails. All emails share a common structure i.e. subject of the email and the body of the email. Spam emails are identified through the contents of the email, based on the assumption that the content of the spam mail is different than the legitimate or ham mail. The frequently used words in spam emails are words that are related to any product, the recommendation of services, dating related content, etc.

The process of spam email detection can be broadly categorized into two approaches (Ma et al., 2009): knowledge engineering and machine learning approach. Knowledge engineering is a network-based approach in which IP (internet protocol) address, network address along with some set of defined rules are considered for the email classification. This is a fruitful method but bares the limitation of time consumption. The updating of rules and maintenance are also tedious for users. As an alternative, machine learning techniques can be used which does not involve any set of rules. Comparatively, it is much efficient than the former method (Guzella and Caminhas, 2009). Several classification algorithms are used which classifies the emails based on its content and attributes.

Key Terms in this Chapter

Overfitting: A condition when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data.

Email IMAP: The internet message access protocol (IMAP) is a mail protocol that can be used from a local client machine to access emails on the remote web server.

Malevolent: Having a harmful influence or malicious.

Knowledge Engineering: A field of artificial intelligence that tries to collect the judgment and behavior of a human expert in a given field.

Swarm Intelligence: A concept in artificial intelligence which considers the collective behavior of decentralized, self-organized natural and artificial systems.

Trojan: A type of computer virus in the form of computer software such as utilities, games, and sometimes even antivirus programs.

Innocuous: Not harmful or offensive.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Detection and Prediction of Spam Emails Using Machine Learning Models

Abstract

Introduction

Key Terms in this Chapter

Complete Chapter List