Article Preview
TopIntroduction
A recent investigation performed by Gartner (Titcomb & Carson, 2021) reveals that by 2022 the social media will be dominated with phony news rather than genuine news. As the number of social media users surge rapidly, there is big chance of phony news leaving a large foot print on the internet. Phony news is usually defined as created news containing misinformation, rumour and falsified facts spread over traditional media and even social media (Thota et al., 2018). The aim of any phony news circulator is to gain from sensationalism it creates or cheat readers, cause loss to the character of a personality or organisation (Liu & Yang, 2019). To counter this, early detection of phony news is required (Allcott & Gentzkow, 2017). The current techniques lack expertise in tackling with phony news circulation among social media platforms like Reddit, Watsapp, blogs, Twitter and Facebook (Bourgonje et al., 2017). Cyber Security authorities all around the world have reported on a new form of phony trick, click bait where in the perpetuator lures an innocent user into clicking phony news by offering gifts (Elisa & Jeffrey, 2017). A popular research proposed that in 2017, 67% of U.S. citizens above the age greater than 18 consumed news mainly from social media (Vosoughi et al., 2018). In comparison with genuine news the fake news propagates relatively swifter, deeper in to the society according to some researchers (Conroy et al., 2015). It has become eminent to detect and dam the genesis and flooding of phony news on social media. Phone news detection is a herculean task as it deals with the cross checking of the news item with trustable entities like news papers, media houses and government agencies. Computer Models developed using NLP techniques and ML algorithms can be implemented for classifying news as either phony or true.
This work developed 12 models by pipelining feature selection algorithms with ML algorithms for detecting phony posts on theOnion and nottheOnion subreddits of reddit social networking site. Exploratory data analysis was performed. Later NLP techniques were implemented to vectorize the words present in the posts. Later ML algorithms were implemented on these vectors to create pipeline models. In the end coefficient analysis was also performed to find out the words that has a positive or negative impact on the classification of the word as phony or true.