Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Thwarting Spam on Facebook: Identifying Spam Posts Using Machine Learning Techniques

Arti Jain, Reetika Gairola, Shikha Jain, Anuja Arora

Source Title: Social Network Analytics for Contemporary Business Organizations

DOI: 10.4018/978-1-5225-5097-6.ch004

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Spam on the online social networks (OSNs) is evolving as a prominent problem for the users of these networks. Spammers often use certain techniques to deceive the OSN users for their own benefit. Facebook, one of the leading OSNs, is experiencing such crucial problems at an alarming rate. This chapter presents a methodology to segregate spam from legitimate posts using machine learning techniques: naïve Bayes (NB), support vector machine (SVM), and random forest (RF). The textual, image, and video features are used together, which wasn't considered by the earlier researchers. Then, 1.5 million posts and comments are extracted from archival and real-time Facebook data, which is then pre-processed using RStudio. A total of 30 features are identified, out of which 10 are the best informative for identification of spam vs. ham posts. The entire dataset is shuffled and divided into three ratios, out of which 80:20 ratio of training and testing dataset provides the best result. Also, RF classifier outperforms NB and SVM by achieving overall F-measure 89.4% on the combined feature set.

Chapter Preview

Top

Introduction

In the today’s world the change in the Internet technology has led us to the usage of different Online Social Networks (OSNs)¹ (Andreassen et al., 2016; Brown et al., 2008; Egele et al., 2017; Panicker & Devadas, 2015), also known as the changing web. The out bursting popularity of these OSNs have attracted huge number of users to use their platform which results into the sharing and storing of users’ personal information on these networking sites. These networks help the users to interact, exchange and collaborate with their social-circle. These OSNs are also helping its users to communicate with their social community and keep their users updated with different domains such as news, active learning, job searching and web application development etc. Such vital information has aroused the interest of spammers² to take the advantage of the trust among users to deceive them for spammers (Adewole et al., 2017) own benefits. Facebook OSN is currently among one of the leading OSN present across the world and having over 2.05 billion³ monthly active users. Facebook is around five times greater than its next greatest partner Twitter. A survey report⁴ shows that out of 5,173 adults suggested that 30% of people get their news from Facebook, while only 8% receive news from Twitter and 4% from Google Plus. The users of Facebook not only uses Facebook for communicating with their friends but also for keeping regular updates about what is happening around the globe.

Spammers at present are discovering different ways to reach out to the users of the OSNs for spreading spam messages (Bhat & Abulaish, 2013; Prieto et al., 2013) and thereby, making the OSNs as vulnerable and exposed targets. Mostly, these spam messages are sent in high volume so that they can influence large amount of users in a short span of time. Moreover, these messages reduce the memory of the inboxes. These messages are targeted to specific audience or can be used to perform tricks such as phishing, identity theft (Gao et al., 2012). Apparently, spam which was earlier in the form of text containing irrelevant information for the users is now currently it is being noticed as it being spread using images and videos too. To do so, spammer evades the programmed filters using different techniques such as obfuscating keywords, wrapping long urls, and using image or video instead of textual content.

In this chapter, a methodology to segregate spam vs. ham post⁵ from Facebook OSN is provided by combining the textual, image and video features using three supervised Machine Learning (Shalev-Shwartz & Ben-David, 2014) techniques, namely- Naïve Bayes (NB) (Lee et al., 2010), Support Vector Machine (SVM) (Shalev-Shwartz & Ben-David, 2014) and Random Forest (RF) (Breiman, 2001) using RStudio^®6. Our methodology comprises of various stages. Firstly, data extraction is done from the Facebook posts which contain text, image and video posts followed by the data pre-processing (DP) stage. DP stage consists of stop words removal, stemming, lemmatization, photo pre-processing and url link blacklisting. The next stage consists of relevant features extraction for the identification of whether a post is spam or not. In this stage, a total of 30 features mare extracted from Facebook which includes “status type”, “created time”, “updated time”, “name of the photo” etc. Then the data is shuffled and split into different training and test data ratios i.e. 60:40, 70:30 and 80:20 respectively. Then ML techniques are applied to train the three classifiers (NB, SVM and RF). Further, using RStudio these classifiers are used in the testing phase to predict whether a given post is spam or ham. Finally, the classifiers are evaluated on the basis of standard metrics- precision, recall and F-measure.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Thwarting Spam on Facebook: Identifying Spam Posts Using Machine Learning Techniques

Abstract

Introduction

Complete Chapter List