Flesch-Kincaid Measure as Proxy of Socio-Economic Status on Twitter: Comparing US Senator Writing to Internet Users

Flesch-Kincaid Measure as Proxy of Socio-Economic Status on Twitter: Comparing US Senator Writing to Internet Users

Samara Ahmed, Adil Rajput, Akila Sarirete, Tauseef J. Chowdhry
Copyright: © 2022 |Pages: 19
DOI: 10.4018/IJSWIS.297037
Article PDF Download
Open access articles are freely available for download

Abstract

Social media gives researchers an invaluable opportunity to gain insight into different facets of human life. Researchers put a great emphasis on categorizing the socioeconomic status (SES) of individuals to help predict various findings of interest. Forum uses, hashtags and chatrooms are common tools of conversations grouping. Crowdsourcing involves gathering intelligence to group online user community based on common interest. This paper provides a mechanism to look at writings on social media and group them based on their academic background. We analyzed online forum posts from various geographical regions in the US and characterized the readability scores of users. Specifically, we collected 10,000 tweets from the members of US Senate and computed the Flesch-Kincaid readability score. Comparing the Senators’ tweets to the ones from average internet users, we note 1) US Senators’ readability based on their tweets rate is much higher, and 2) immense difference among average citizen’s score compared to those of US Senators is attributed to the wide spectrum of academic attainment.
Article Preview
Top

1. Introduction

1.1. Motivation and Background

Social computing garnered significant attention after the advent of Web 2.0. The extensive use of blogs, Myspace communities, and various online forums affected the way people conducted social interactions (Parameswaran & Whinston, 2017). Social media platforms offer a unique chance to perform social science and online research. It offered users a forum to voice their views unequivocally since they don't need to reveal their true identity. While many social media platforms today require the end-users to confirm their real identity, the process is not always perfect. In addition, federal regulations bind social media companies to protect the real identity of the end-user. The 2004 US presidential campaign, for example, popularized the idea of online advertising and encouraged many scholars to research its influence (Weinberg & William, 2006).

The launch of Amazon Mechanical Turk in 2005 brought a new dimension to the area of Artificial Intelligence (Irani, 2017). The crowdsourcing platform allowed the users to outsource tasks to humans, which would be difficult for a computer to perform. The crowdsourcing platform allows advertisement of a task for a group of users who will perform it for an incentive (money, contribution to literature, etc.). The social media platform has the concept of crowdsourcing embedded in it, as pointed out by (Paniagua & Korzynski, 2017). As an example, Twitter was used successfully in various domains such as emergencies; disaster relief, etc. in the context of crowdsourcing (Jordan et al., 2018) - discussed more in the next section. In these scenarios, the experts depended on the feedback from volunteers in the affected region, based on which agencies could come up with an appropriate real-time response. Such scenarios come under the umbrella of active crowdsourcing. Passive crowdsourcing, on the other hand, involves soliciting user action without the users consciously realizing that they are contributing. The concept of hashtag on twitter where various users would contribute to a particular topic is one example of passive crowdsourcing. In this scenario, people interested in soliciting feedback can start a hashtag that can help gather valuable information.

Social and medical sciences researchers have begun to focus on the vast number of available data. Although social network data are not the means by which a particular individual's problems are identified or treated by themselves, the data can be used to identify different symptoms as measures for certain problems of certain issues in mental health (Rajput & Ahmed, 2018a). The techniques developed in the field of Natural Language Processing (NLP) can be invaluable in the processing and segmentation of text information, as needed by social and medical science practitioners, using the various segmenting techniques. The choice of the corpus is one of the main requirements to these steps. We use the definition of the corpus as “a collection of naturally occurring text, chosen to characterize a state or variety of a language” (Schvaneveldt et. al., 1976). In general, constructing a corpus includes considering a specific text to the problem and deriving keywords, bigrams and sometimes trigrams (two or three-word sentences) that are used excessively in a given area. As an example, (Rajput & Ahmed, 2018b) argue that a corpus should be developed to assist mental health professionals in detecting depression among users provided some group of people. The researchers base their observations on the twitter hashtag # depression. The study gathered overwhelmingly evident terms and found that these words are part of the language of depression patients. Once such a corpus is established, researchers would look at a random text and predict with a certain assurance whether the words used by the individual are the same frequency as those in the corpus.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing