Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

A Study of Feature Selection and Dimensionality Reduction Methods for Classification-Based Phishing Detection System

Amit Singh, Abhishek Tiwari

Source Title: International Journal of Information Retrieval Research (IJIRR) 11(1)

DOI: 10.4018/IJIRR.2021010101

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Phishing was introduced in 1996, and now phishing is the biggest cybercrime challenge. Phishing is an abstract way to deceive users over the internet. Purpose of phishers is to extract the sensitive information of the user. Researchers have been working on solutions of phishing problem, but the parallel evolution of cybercrime techniques have made it a tough nut to crack. Recently, machine learning-based solutions are widely adopted to tackle the menace of phishing. This survey paper studies various feature selection method and dimensionality reduction methods and sees how they perform with machine learning-based classifier. The selection of features is vital for developing a good performance machine learning model. This work is comparing three broad categories of feature selection methods, namely filter, wrapper, and embedded feature selection methods, to reduce the dimensionality of data. The effectiveness of these methods has been assessed on several machine learning classifiers using k-fold cross-validation score, accuracy, precision, recall, and time.

Article Preview

Top

1. Introduction

In Phishing, the phisher creates a fraud phishing website to mislead web users to steal their sensitive personal information. Deception is the way of Phishing by hiding as a trusted entity in electronic communication. The first time Phishing discovered in the 1980s. Anti-Phishing Working Group (APWG) reported 51,401 unique phishing websites in June 2018 (Chiew, Tan, Wong, Yong, & Tiong, 2019; Phishing Activity Trends Report 2nd Quarter 2018, 2018). Another report by RSA estimated that global organizations lost 9 billion$ due to phishing fraud in 2016 (Heidi Bleau, 2016). It is one of the biggest cybercrime faced by internet users. Generally, phishing attacks are accomplished using emails and website spoofing. Phishers start the attack by sending spoofed emails to victims and victims think this is authentic and secure, thereby they got trapped. Figure 1 represents the workflow structure of phishing.

Figure 1.

Phishing workflow

Apart from email, phisher leads users to various similar looking authenticated, secure and famous websites via advertisement links. There are many ways of phishing detection and prevention such as the use of any authorized anti-phishing software, naive browser extensions (Google and Mozilla Firefox use Blacklist warning system) and toolbars. Blacklist warning system queries a database of already known phishing URLs so it will not be able to identify new upcoming phishing websites (Chiew et al., 2019). Designing an intelligent phishing detection system, based on Machine learning classification model can easily identify whether this website or web-link is for phishing or not. These ML based classification systems are very effective. However, for creating these prediction system in machine learning, feature selection and dimensionality reduction are very important steps. Investigation of state of the art approaches reveals that there is a need for a systematic study of feature selection and dimensionality reduction approaches to design an intelligent and capable system to detect the phishing websites.

For any Machine learning classifier, we need useful and relevant features. For choosing, those relevant features from the dataset feature selection is paramount. Feature selection is even more useful when we are dealing with high dimensional data. This high dimensional dataset poses many problems, such as increased training time and sometimes it may lead towards overfitting of our machine-learning model. The feature selection process will select relevant attributes from data based on the method specified by the analyst (Ameen, Balogun, Usman, & Fashoto, 2016). These reduced features will help us in improving the accuracy of the classifier and decrease the computational cost of the classifier. There are three main category of feature selection techniques filter method, wrapper method, and embedded method. All these techniques have their unique significance, and we will discuss it section 3.

Dimension reduction is another feature preprocessing technique before the design of a classifier. Dimensional reduction transforms the dataset into a low dimensional dataset, ensuring it will not change the meaning of data. When the dimensionality of the datasets reduced, then it improves the performance of the classifier in comparison to applying on original data. Dimensionality reduction can be both linear and nonlinear; it depends on the dataset.

Feature selection and Dimensionality reduction both are used in designing the best Machine learning Classification model with a difference that features selection technique aims at selecting the features from original dataset whereas dimensionality reduction technique aims at transforming the dimensionality of original datasets.

Machine learning focuses on developing the computation algorithms to find out patterns, reasoning, and rules from data to design Machine Learning model, which can detect or make a prediction about forthcoming occurrences (Ali, 2017). Machine learning is supervised learning if outputs are given with training data for training the model else; it is unsupervised learning. Many supervised learning algorithms are successfully working on real-life applications. Some popular Machine learning Classification techniques are Support Vector machine (SVM), Naïve Bayes classifier, K Nearest Neighbor (KNN), Decision trees, Random forest, and Ensemble methods. These Classification models are being used to classify new upcoming data as either positive (or one) or negative (or zero).

In summary, we make the following contributions in this survey paper:

Complete Article List

Search this Journal:

Reset

Volume 14: 1 Issue (2024)

Volume 13: 1 Issue (2023)

Volume 12: 4 Issues (2022): 3 Released, 1 Forthcoming

Volume 11: 4 Issues (2021)

Volume 10: 4 Issues (2020)

Volume 9: 4 Issues (2019)

Volume 8: 4 Issues (2018)

Volume 7: 4 Issues (2017)

Volume 6: 4 Issues (2016)

Volume 5: 4 Issues (2015)

Volume 4: 4 Issues (2014)

Volume 3: 4 Issues (2013)

Volume 2: 4 Issues (2012)

Volume 1: 4 Issues (2011)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

A Study of Feature Selection and Dimensionality Reduction Methods for Classification-Based Phishing Detection System

Abstract

1. Introduction

Complete Article List