TA-WHI: Text Analysis of Web-Based Health Information

TA-WHI: Text Analysis of Web-Based Health Information

Piyush Bagla, Kuldeep Kumar
DOI: 10.4018/IJSSCI.316972
Article PDF Download
Open access articles are freely available for download

Abstract

The healthcare data available on social media has exploded in recent years. The cures and treatments suggested by non-medical experts can lead to more damage than expected. Assuring the credibility of the information conveyed is an enormous challenge. This study aims to categorize the credibility of online health information into multiple classes. This paper proposes a model named Text Analysis of Web-based Health Information (TA-WHI), based on an algorithm designed for this. It categorizes health-related social media feeds into five categories: sufficient, fabricated, meaningful, advertisement, and misleading. The authors have created their own labeled dataset for this model. For data cleaning, they have designed a dictionary having nouns, adverbs, adjectives, negative words, positive words, and medical terms named MeDF. Using polarity and conditional procedure, the data is ranked and classified into multiple classes. The authors evaluate the performance of the model using deep-learning classifiers such as CNN, LSTM, and CatBoost. The suggested model has attained an accuracy of 98% with CatBoost.
Article Preview
Top

Introduction

In the healthcare industry, wrong treatment, misinformation, self-treatment, and myths related to unconventional treatments is not a recent development. It is as ancient as medical care itself. Before the boom of the Internet, Radio, and Television, this issue was based on the therapeutic relationship as well as its context (Fernández-Celemín & Jung, 2006). The spectrum of damage is taken to an entirely new degree because of global technological advancement. Misinformation on social media became so common that in 2016 Oxford dictionary introduced “post-truth,” meaning “relating to or denoting circumstances in which objective facts are less influential in shaping public opinion than appeals to emotion and personal belief” (Harsin, 2018). Posting misleading or misinformation on social media is a fashion for some.

Social networking services like Facebook, followed by Twitter, are currently the industry leaders, with over 1.3 billion members and a monthly average fluctuation of 300 million people. Every second, their interactions create gigabytes of data (Alrubaian et al., 2018; Ranganath et al., 2017). Online social networks are appealing because they provide a quick and easy way to acquire health information. It is also quite simple to share information with others. However, the broad dissemination of incorrect information is made possible by rapid data scattering at a high pace with little effort. Thanks to the pandemic in 2020, social media usage increased by many folds. More information is now shared on social media than before 2020 (Zhang et al., 2017). The world has seen how misinformation about COVID spreads like wildfire, and every time World Health Organization (WHO) or some medical authority comes up to deny the news. People are so scared to visit hospitals that they prefer the social media Doctor (Zwolenski & Weatherill, 2014). Following the incorrect therapeutic advice given on social media might be fatal.

Text analysis is the practice of analyzing a vast amount of textual material to capture the key concept, trends, and hidden relationships. It is the process of transforming the unstructured text into a structured format to identify meaning patterns and new insights. Analysis of text is a crucial step in getting the hidden meaning behind it. The most popular technique for doing so is sentiment analysis. There are a number of researchers who have used this technique to get the actual sentiment behind the post on social media, especially on Twitter. It is extended further to incorporate a machine learning algorithm to perform the classification task. As a result, the credibility of the post can be identified (Alharbi & Alhalabi, 2020; Gunti et al., 2022; Mohammed et al., 2022). People are now mixing their regional language while making any social media post, no matter from which country they belong. In technical terms, we call this code-mixing. This makes the analysis of text even more difficult. However, techniques such as Machine Learning, Neural Networks, and LSTM (Long Short-Term Memory) can be used to mitigate the problem of code-mixing (Sharma et al., 2021; Singh & Sachan, 2021). With the development of technology, the amount of data generated on the Internet has increased daily. This data includes valuable patterns that must be recognized to get meaningful information. There are several methods that facilitate the completion of this task, such as using data mining techniques (García-Peñalvo et al., 2021), text mining and privacy preservation techniques in name analysis (Veluru et al., 2015), and scientific issue tracking with topic analysis based on crowdsourcing (Kim et al., 2018). In one way or another, all these methods contribute to the data mining process. However, there is always room for strengthening the capabilities of the proposed approaches. For example, there is always a question regarding the authenticity of the information patterns found during text mining. Very few studies address this issue and those that do have several drawbacks. To determine the veracity of web-based health information, they used a predetermined data mining algorithm that operates on the existing dataset. However, there is an ongoing need to develop a strategy based on a user-generated algorithmic approach to determining the credibility of web-based health information using real-time datasets.

Complete Article List

Search this Journal:
Reset
Volume 16: 1 Issue (2024)
Volume 15: 1 Issue (2023)
Volume 14: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 13: 4 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing