“TwitterSpamDetector”: A Spam Detection Framework for Twitter

“TwitterSpamDetector”: A Spam Detection Framework for Twitter

Abdullah Talha Kabakus (Düzce University, Düzce, Turkey) and Resul Kara (Düzce University, Düzce, Turkey)
Copyright: © 2019 |Pages: 14
DOI: 10.4018/IJKSS.2019070101

Abstract

Twitter is the most popular microblogging platform which lets users post status messages called tweets. This popularity and the advanced API provided by Twitter to read and write Twitter data programmatically attracts the attention of spammers as well as legitimate users. Since Twitter has some unique characteristics, the traditional spam detecting methods cannot be directly used to detect spam on Twitter. Therefore, a spam detection framework which is specially designed for Twitter namely TwitterSpamDetector is proposed in this paper. TwitterSpamDetector uses Twitter-specific features to detect spam on Twitter. 77,033 tweets which are posted by 50,490 users collected using the API provided by Twitter. Naive Bayes is used to train TwitterSpamDetector using the selected features of Twitter which clearly classify the spammers from legitimate users. According to the evaluation result, TwitterSpamDetector's accuracy and sensitivity are calculated as 0.943 and 0.913, respectively.
Article Preview
Top

1. Introduction

Web 2.0 has introduced the ability to visitors to add content into the web instead of just reading the content. The social networking platforms such as Facebook, Instagram, Twitter, and Pinterest have risen after the technologies of Web 2.0. Twitter, founded in 2006, is currently the most popular microblogging platform which lets users post status messages limited to 140 characters called as “tweets”. Twitter has declared that there are 313 million monthly active Twitter users who post 500 million tweets per day1,2,3. Since its debut, Twitter has been one of the most popular forum of electronic communication (Kiss, Horváth, & Buzás, 2015). Users of Twitter post tweets related to a large variety of topics including politics4, news, events, personal ideas, celebrities, technology (Benevenuto, Magno, Rodrigues, & Almeida, 2010; Bravo-Marquez, Mendoza, & Poblete, 2013; Moriya & Ryoke, 2013; Pak & Paroubek, 2010; Tumasjan, Sprenger, Sandner, & Welpe, 2010). Twitter lets its users follow other users to track their interests. Unlike the most social networks, this relation is two-way which means a user may not follow someone who follows him. Users see the tweets posted by his followers sorted by the post date on his timeline. Alongside this timeline, a list of trending topics (aka TT) which are most talked topics at a given point is listed on the homepage. Thanks to the trending topics, users become aware of the most popular topics near to his location or the location he prefers. Users tend to post their tweets using hashtags which are words or word groups starting with the character “#” embedded into tweets in order to 1. attract the attention of other users who track the hashtags to keep up news related to the topics they are interested in, and 2. define the topic or the emotion of the tweet. Twitter provides an advanced API5 to read and write Twitter data programmatically. Despite that average 500 million new tweets are posted per day (“Twitter Usage Statistics - Internet Live Stats,” 2019), Twitter receives 15 billion API calls6 daily which is almost three times more than any other popular social media platforms including Facebook, Instagram or search engines such as Google, Yandex, and Bing. This easy-to-use environment and popularity attract the attention of spammers who post unsolicited tweets (which are called spam) (Blanzieri & Bryl, 2008; Drucker et al., 1999; Song et al., 2011) by hijacking trending topics and abusing reply, hashtag or mention functions to legitimate users in order to 1. propagate advertise and pornography, 2. share harmful links which direct users to malicious content, create fake trending topics, sell fake followers, contaminate the Twitter Streaming API, and phish them (Benevenuto et al., 2010; Boyd & Heer, 2006; Echeverría & Zhou, 2017; Gong er al., 2012; Grier et al., 2010; Jagatic et al., 2007; Kamble & Sangve, 2018; Lee et al., 2011; Song et al., 2011; Wang, 2010; Zhang & Paxson, 2011). Real-time analyzing services track and analyze real-time tweets to reveal the trends all over the world with minimum delay7. Similar to them, sentiment analysis systems make a conclusion about any topics by analyzing the tweets related to them which turn Twitter a useful real-time poll system (Go et al., 2009; Jiang et al., 2011; Liu et al., 2015; Montejo-Ráez et al., 2014; Schumaker et al., 2016). The need for those services has arisen due to the huge volume of data which makes the analysis labor-intensive and time-consuming. But the eventual performance of those services completely relies on the ability of filtering spammers from legitimate users (Echeverría & Zhou, 2017). Spam in Twitter is still widespread despite the serious actions8 taken by Twitter against it and it poses serious security threats to the legitimate users. Twitter had let users report spammers by posting tweets which mention the spammer(s) to the official “@spam” account of Twitter (CChen et al., 2016; Song et al., 2011; Wang, 2010); Twitter has reported that this method is outdated9. As a result of that, this account does not exist anymore. Authors think that this method was labor-intensive and not fast enough to block spammers before they do their malicious actions when it is considered that a huge number of spammers10,11,12 still exist in Twitter. Also, Wang (Wang, 2010) reports that this method is abused by both hoaxes and spam. Twitter has revealed that 8.5% of its monthly active users which equals approximately 23 million users have automatically contacted their servers for regular updates13,14. According to a recent report by Echeverría and Zhou (Echeverría & Zhou, 2017), the Star Wars botnet controls 350,000 bots which are reported as huge enough to contaminate the Twitter API and the Twitter environment itself. Twitter admits that even their approach generates false negative detections as one of its examples is reported15 that Twitter has recommended to follow bots instead of legitimate accounts. A study reports that 83% users of social networking platforms have received at least one unwanted interaction (which may be a friend request or a message) (A Study of Social Network Scams, 2008). Another necessity of spam filtering in Twitter is that users of social networking platforms do not show an adequate understanding of their threats as Bilge et al. (Bilge et al., 2009) report that 45% of users on a social networking platform are ready to click on the links shared by their “network”, even though they may not know them in real life.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 11: 4 Issues (2020): Forthcoming, Available for Pre-Order
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing