Article Preview
Top1. Introduction
Electronic mail, in short email has become one of the popular and attractive web or mobile application. Emails are widely used feature of the internet which make the users to communicate worldwide by sending and receiving different kinds of emails with an email address. More plainly, a user can send and receive text files, images, PDF files, etc., either to individual or group of individuals seamlessly without any hesitation through a network (Golan et al., 2015). With the cheaper rates of bandwidth offered by different network service providers, even a naïve user is also using the email application with an ease. With the advent of mobile phones, especially mobile apps into the lives of people, the usage has gone tremendously high. Most of the currently existed mail providers provide email services with no cost up to some notable amount of space. In fact, this is one of the main reasons that the email service providers are getting some notable share in the internet traffic. According to the statistics, there are around 6 billion active email users throughout the globe using email on a regular basis for different purposes. Out of it, around 2 billion people are actively engaged with different emails which related to their personal or business purpose (Stolfo et al., 2019; Ashminov & Stein, 2019).
At one side this seems to be one the greatest achievement in this current tech-driven world, but unfortunately, this has become one of the business strategies for spammers for flooding users’ mailboxes with their business-related emails and getting profits from it (Hatton & John, 2017). These spam mails will lead users to spend their time for classifying mails into their desired categories and make them segregated. However, most of the users are feeling unhappy with these spam emails because of its unwanted filling of the inbox and wasting their valuable time for dealing unsolicited emails every time when they open mailboxes. Making users’ mailbox clear by detecting and eliminating all the spam mails manually is not a suggested way. So, automated spam filtering tools are needed to come into existence. These tools must analyse all the incoming mails effectively and categories which are relevant, and which are irrelevant. So far, many automated mail classifiers have been developed and these are extensively used by different mail service providers to classify the incoming emails into different categories. But many of them are not perfectly addressing the problem of eliminating spam emails from the inbox (Khan et al., 2015; Yu & Xu 2008).
Most of the datasets of real-world applications contain numerous features that may or may not relevant to the solution. Only the relevant feature is supposed to be extracted from the dataset. Removing those unwanted and redundant features from the dataset reflects its impact on results. The performance of Machine Learning strategies be relied on the type and the number of features that are extracted from the given problem. In fact, all these strategies are mainly focused on how we select the features of the dataset for getting a proper solution to the given problem. In general feature extraction would be done either by a human expert or by automated feature extraction tools like Principal Component Analysis, Deep Belief Network, Fuzzy based systems, rule-based techniques etc. (Idris et al., 2014; Idris et al., 2015; Wu 2009). In this paper, we present a new automated optimal feature extraction method for classifying the Enron-spam dataset. Recently added Metaheuristic algorithm into Computational Intelligence, i.e. Northern Bald Ibis optimization algorithm is used for extracting the optimal feature subset.
The structural organization of the remaining paper is presented as follows. The outline of review of the literature is presented in Section II. The Section III presents the existing optimal feature selection method where improved whale optimization algorithm is described. The Section IV presents the proposed optimal feature selection method where the standard Northern Bald Ibis Algorithm and SVM classification techniques are presented. The experimental result analysis has been furnished in Section V. Conclusions of the whole paper and future scope is presented in the last section.