1.1. Motivation and Background
Social computing garnered significant attention after the advent of Web 2.0. The extensive use of blogs, Myspace communities, and various online forums affected the way people conducted social interactions (Parameswaran & Whinston, 2017). Social media platforms offer a unique chance to perform social science and online research. It offered users a forum to voice their views unequivocally since they don't need to reveal their true identity. While many social media platforms today require the end-users to confirm their real identity, the process is not always perfect. In addition, federal regulations bind social media companies to protect the real identity of the end-user. The 2004 US presidential campaign, for example, popularized the idea of online advertising and encouraged many scholars to research its influence (Weinberg & William, 2006).
The launch of Amazon Mechanical Turk in 2005 brought a new dimension to the area of Artificial Intelligence (Irani, 2017). The crowdsourcing platform allowed the users to outsource tasks to humans, which would be difficult for a computer to perform. The crowdsourcing platform allows advertisement of a task for a group of users who will perform it for an incentive (money, contribution to literature, etc.). The social media platform has the concept of crowdsourcing embedded in it, as pointed out by (Paniagua & Korzynski, 2017). As an example, Twitter was used successfully in various domains such as emergencies; disaster relief, etc. in the context of crowdsourcing (Jordan et al., 2018) - discussed more in the next section. In these scenarios, the experts depended on the feedback from volunteers in the affected region, based on which agencies could come up with an appropriate real-time response. Such scenarios come under the umbrella of active crowdsourcing. Passive crowdsourcing, on the other hand, involves soliciting user action without the users consciously realizing that they are contributing. The concept of hashtag on twitter where various users would contribute to a particular topic is one example of passive crowdsourcing. In this scenario, people interested in soliciting feedback can start a hashtag that can help gather valuable information.
Social and medical sciences researchers have begun to focus on the vast number of available data. Although social network data are not the means by which a particular individual's problems are identified or treated by themselves, the data can be used to identify different symptoms as measures for certain problems of certain issues in mental health (Rajput & Ahmed, 2018a). The techniques developed in the field of Natural Language Processing (NLP) can be invaluable in the processing and segmentation of text information, as needed by social and medical science practitioners, using the various segmenting techniques. The choice of the corpus is one of the main requirements to these steps. We use the definition of the corpus as “a collection of naturally occurring text, chosen to characterize a state or variety of a language” (Schvaneveldt et. al., 1976). In general, constructing a corpus includes considering a specific text to the problem and deriving keywords, bigrams and sometimes trigrams (two or three-word sentences) that are used excessively in a given area. As an example, (Rajput & Ahmed, 2018b) argue that a corpus should be developed to assist mental health professionals in detecting depression among users provided some group of people. The researchers base their observations on the twitter hashtag # depression. The study gathered overwhelmingly evident terms and found that these words are part of the language of depression patients. Once such a corpus is established, researchers would look at a random text and predict with a certain assurance whether the words used by the individual are the same frequency as those in the corpus.