COVID-19 Pandemic: Insights of Newspaper Trends

The study aims to analyze the change in coverage of health issues awareness, printed on the front page of Indian e-papers (The Hindustan Times and The Times of India) for the pre- and peri-coronavirus period. The collected news articles are examined by performing the latent dirichlet allocation algorithm. The sentiment analysis is performed to analyze the change in the emotions aroused from news articles. The outcome regarding the pre-coronavirus period reveals that the focus of the e-papers was mostly on politics, crime, and economy whereas, in the peri-coronavirus period, the e-papers are focusing more (i.e. 40% topics) on publishing the news related to disseminating the awareness about the coronavirus disease. The priority of news topics includes the active number of cases, medical facilities, and COVID-19 testing. The outcome regarding sentiment analysis reveals that negative sentiments are prominent in the peri-coronavirus period due to fear of the outbreak of the virus.


INTRODUCTION
Indian mass media plays a very important role in shaping our country especially the news media (Ram, 2011). News media is very active in India as there are more than 100000 publishers registered with the Registrar of India and have the second-highest selling newspapers in the world 1 . The recent trend now in India is shifting towards the online world as people are showing interest in the e-papers and also on various social media platforms such as Facebook, Twitter, etc. (Sahni and Sharma, 2020). The public health issue is among one of the most prominent topics discussed these days as we know that Indian Health care has a very contradicting landscape (Kasthuri, 2018). On one end people are served with high-tech medical facilities especially in urban areas but the scenario is different in the remote various researchers performed the analysis of the behavior of the people towards the pandemic by using various social media platforms (Jahanbin and Rahmanian, 2020). The Covid-19 pandemic has affected the mass media world. There is a tremendous use of technology in the different fields during the pandemic time especially in the education system (Garcia-Peñalvo, 2020;Garcia-Peñalvo et al., 2020, García-Peñalvo et al., 2021. This pandemic has also affected work and employment (Hodder, 2020). It has given rise to digital inequalities (Beaunoyer et al., 2020). Even the mass media was loaded with fake news regarding the pandemic (Apuke and Omar, 2021) Further, numerous researchers have performed an analysis of health communication through newspapers of various countries.
One of the studies states that the role of mass media in covering important issues has always been remarkable (Sharma and Gupta, 2017). It has been concluded in a study that the role of mass media has been crucial in spreading awareness about health education among people. It not only spreads awareness but also educates people from time to time. Gupta and Sinha (2010) conducted a study about the health-related messages coverage that appears in print and electronic media. The results concluded that emphasis is laid more on a political subject and less stress is laid on health-related news articles. This study uses a manual method to carry out the research. On the contrary, we have conducted research using the LDA topic modeling approach which automatically extracts the frequent topic covered. Liu et al. (2020), collected media reports on Covid-19, and an investigation was carried out to analyze the role of media in China on the ongoing crisis of Covid-19. The results depicted that mass media news reports in China lagged behind the reporting of Covid-19 as the focus of news reports was on larger society rather than individuals. The main approach use in this study is topic modeling to analyze the situation. In our study, we have inculcated topic modeling to analyze how the awareness of the health problems was dealt with during the pre-and peri-coronavirus period in Indian newspapers. Moreover, sentiment analysis is also performed to analyze the change in the emotions aroused from the news articles.
Further, a researcher carried out a study regarding the COVID-19 and performed a collection of tweets of the two Spanish newspapers to understand how Spanish news media cover public health crises on a social media platform, and analysis was carried out using two main approaches named topic modeling and network analysis . The main drawback of the study is that it has used tweets which are very short texts as compared to full-length articles (Zang et al., 2017). Our study has overcome this drawback by scrapping the full-length news articles from the e-papers. Barkur and Vibha (2020) conducted a study to check the attitude of the people towards the pandemic. The sentiment analysis of the tweets that originated from India was performed after the lockdown announcement. The main drawback is that it doesn't take into account the behavior of people before the outbreak. This drawback has been overcome by our study by comparing the sentiments of the pre-and peri-coronavirus period.
Also, a researcher performed the LDA topic modeling technique to extract the most discussed topics, and the study includes the sentiment analysis of the tweets (Xue et al., 2020). This study analyzed the work using tweets which is not reliant as many tweets are fake. As for the newspapers, the articles are fully confirmed before publishing. Researchers have opted for the LDA topic modeling approach in enormous fields.

MATeRIALS AND MeTHODS
This section described the methodology opted for the discourse analysis of the change in health-issues awareness by the newspapers for the pre-and-peri COVID-19 period which includes data collection, data cleaning, topic modeling under which LDA and sentiment analysis is performed as illuminated in Figure 1.

Data Collection
For collecting the news articles from E-papers, we have used the automated scraping tool (Data Miner 7 ). The automated process outperforms the manual way of copying and pasting the required content from the web sources, as it takes less time and effort to do the same task. It helps in scraping the data from the web pages (here, we have scraped the news articles from E-Papers) and caters output in CSV file format. In this study, we have collected the 6347 news articles printed on the front page of two prominent and reputed Indian newspapers (The Hindustan Times and The Times of India) from July 2019 to June 2020. Data Miner uses Xpath, JQuery, and CSS Selector to identify the information on the HTML web page. The data is fetched by creating recipes or by using the already available recipes. For extraction, we have created our recipes. The recipes are created in the following steps: • By choosing the type of the page List page or detailed page. In this extraction, we have used the List page as we want multiple rows to be extracted. • Specifying the column names under which the data is to be stored. The data under different columns can be selected using the Find tool. In this case, we have defined two columns first one extracts the headlines and the other column extracts the detailed news from the e-paper. • The recipe is then saved and run. After which it will automatically extract the news headlines and the detailed news into the respective columns.
The number of articles scrapped from the e-paper's front page for the pre-Covid-19 and peri-Covid-19 period is presented in Table 1. Figure 2 and Figure 3 presents the number of articles that were printed on the front page of the e-paper during the pre-and-peri Covid-19 period. All segments of the front page are extracted for analysis purpose. The results show that number of articles printed on the front page of the Times of India was more than the Hindustan Times. The number of articles printed on the front page for the peri-Covid-19 was more as compared to the pre-Covid-19 period for both the E-papers.

Data Preprocessing
The data processing step involves removing the noise from the data. It is one of the most important tasks if not performed properly can lead to errors in the results (Garcia et al., 2015). The preprocessing involves the following main steps 8 : • Segmentation: This process involves breaking down the larger strings into smaller chunks called tokens.   • Cleaning: It involves removing stop words and dealing with the capital letters and various other characters. • Normalization: It consists of the mapping of scheme terms or linguistic reductions via stemming, lemmatization. • Annotations: It includes labeling, adding markups, or part-of-speech tagging. • Analysis: It involves a generalization of the dataset for feature analysis which is further used for finding the relationship between words.
In this study, the preprocessing of the news articles is done using RStudio. For performing preprocessing "tm" package is used 9 . Firstly, the corpus is created using the function "Corpus". After creating the corpus, the "tm_map" function is used for the preprocessing. Using the "tm_map" function the news articles are converted into lower case, punctuations, stopword and numeric values are removed. Further, stemming is performed that reduces the words to unify across the documents. Extra blank spaces which are also known as white spaces are being expunged during pre-processing. A separate CSV file is created after pre-processing for further analysis.

Topic Modeling
In Natural language processing, Topic modeling is an unsupervised machine learning approach that can scan various documents, identify the words and similarities between them, and automatically cluster together similar words that best describe the documents 10 . The clusters are known as the topics which provide the abstract view of the documents. There are four main methods to implements topic modeling which consist of Latent Semantic Analysis (Deerwester, 1990), Probabilistic Latent Semantic Analysis (Hofmann, 1999), Latent Dirichlet Allocation (LDA), and Correlated Topic Model. In this study, we have incorporated the LDA technique as it performs better than the other topic modeling techniques (Chehal et al., 2020).

Latent Dirichlet Allocations (LDA)
Latent Dirichlet allocation (LDA) is considered to be a generative probabilistic model of a corpus (Blei, 2003). It is a three-level hierarchical Bayesian mixture model in which the main goal is to map documents present in the corpus to an appropriate topic that covers plenty of words present in the document. The topics are represented as the mixture of underlying topic probabilities which provides the explicit representation of the documents. The functioning of LDA is defined in Figure  4 (Hidayatullah et al., 2019).
The basic notation used to represent the LDA model is given below: • T denotes the number of documents in a given corpus.
• N is the number of words in a given document (document i has words).
• α is the parameter of the Dirichlet prior on the per-document topic distributions.
• β is the parameter of the Dirichlet prior on the per-topic word distribution.
• θ i is the topic distribution for the document i.
• ψ k is the word distribution for topic k. • x ij is the topic for the jth word in document i.
• w ij is a specific word.
The generative model of LDA can be achieved in three main stages: 1. Choose θ i ~ Dir(α) where i belongs to {1,…., T} and Dir(α) is Dirichlet distribution with symmetric parameter α. 2. Choose ψ k ~ Dir(β) where k belongs to {1,…., N}. One of the researchers has specified that the LDA is an appropriate method to study news media coverage (Daud et al., 2010). Another study has used this method for the identification of 16 frames from European refugee crisis news across five countries (Heidenreich et al., 2019). One of the researchers (Poirier et al., 2020) has performed the LDA to identify six news frames (Chinese outbreak, economic crisis, health crisis, helping Canadians, social impact, Western deterioration) from 12 Canadian media sources. The LDA topic modeling method is applied in this research as it helps in identifying the prominent topics covered in the newspapers.

The Optimal Number of Topics
For the appropriate implication of the LDA algorithm, it is important to identify a meaningful number of topics (Arun et al., 2010). A low value of K (number of topics) can result in few or broad topics, whereas a high value of K results in uninterruptable topics. We have used the topic coherence method to find the optimal number of topics. Topic coherence measures the score of a single topic by measuring the degree of semantic similarity between the high-scoring words in the topic. In this study, the C V coherence method (Roder et al., 2015) has been used which is considered to have the highest correlation with the human interpretation. Topic coherence is achieved through four stages.
• Segmentation: In this case, splitting of the top-n words into pairs. The segmentation is defined as in Eq. 1: • Probability Estimation method: This method defines how probabilities can be derived from the given data source. In this case, the Boolean sliding window methods in which words are counted using the sliding window. The windows move from documents to documents one word token per step and with each movement, a new virtual document is created of size s. • Confirmation Measure: For every segmented pair confirmation measure (φ) is calculated that measures how W* and W' are correlated with each other that is based on the similarity of W' and W* concerning all words in W. The similarity is calculated using the vectors v W and v ′ ( ) (W*) that is represented in Eq. 2. Further, normalized pointwise mutual information (NPMI) is calculated is to check the agreement between w i and w j as exemplified in Eq. 3. Finally, confirmation measure (φ) for all segmented pairs (S i ) is calculated via cosine vector similarity of all context vectors as equated in Eq. 4: In this study, the "genismr" library is used under which the 'c_v' metric is implemented for calculating the coherence score. The coherence score is observed for the different number of topics. Further, the plot is drawn for coherence score vs a total number of topics to find the optimal number of topics.

Visualization of Topics
The topic visualization is another important aspect of topic modeling to understand the documents. In this study, the visualization of topics is achieved using library LDAvis (Sievert and Shirley, 2014) in the R studio. This library is designed such that it helps in the interpretation of the topics and the information is extracted from the fitted LDA topic model to interactive web-based visualization 11 (Fig.  5). The visualization of topics has two important aspects that play an important role in interpreting the topics. The left panel of the visualization represents the global view of the topic. The topics are represented in the form of circles in a 2-D multidimensional plane in which the center is depicted by calculating the distance between the topics as presented in Figure 5. Topic importance is calculated by using the area of the circle. Topics with a larger area are the most prevalent topics. On the other hand, the right panel represents the bar chart in which bars represent the frequencies of the terms that helps in depicting the topics. The red bars depict the topic-specific frequency of the terms whereas the bluish-grey bar represents the corpus-wide frequency of the terms. The useful terms are interpreted using a metric known as relevance which is denoted by λ. Relevance (Sievert and Shirley, 2014) is defined as the degree to which a term appears in the topic in exclusion to the other terms. The "optimal" value of λ was tested for 0.6, and it resulted in an estimated 70% probability of correct identification (Sievert and Shirley, 2014). For values of λ near 0 and 1, the correct value was estimated to be near about 53% and 63% respectively. So in our study, we have taken the value of λ as 0.6.

Sentiment Analysis
Sentiment Analysis in the field of Natural Processing Language (NLP) is used to extract the sentiments aroused from the opinions or reviews of a particular matter 12 . Opinion mining, sentiment extraction, and subjectivity analysis (Chandni et al., 2015) are the terms used interchangeably for sentiment analysis. Sentiment analysis is classified into seven dimensions which include subjectivity classification, sentiment classification, review usefulness measurement, opinion spam detection, lexicon creation, aspect extraction, and polarity determination (Ravi and Ravi, 2015). Polarity determination deals with finding the polarity of the sentence whether a sentence is positive, negative, or neutral which is carried out in our study and is of the utmost importance as it helps in analyzing people's behavior. Polarity analysis is performed using the "sentiment" library in Rstudio 13 . The "sentimentr" library provides better results than a simple dictionary lookup approach. The other advantage of using this library is that it is very simple to implement as it can be achieved with few lines of code. Moreover, It compensates for inversions 14 . It relies on the list of words and phrases with positive and negative connotations. Sentimentr library works on valence shifters that include negators, amplifiers, deamplifiers, and adversative conjunctions. A sentiment dictionary is used by the algorithm which uses an equation to assign value to the polarity of each sentence fist to tag polarized words 15 . The notations used in this algorithm are shown as follows in Table 2.
Each paragraph P i consists of the sentences as equated in Eq. 5: Every sentence is further broken down into words represented by W as in Eq. 6: The polarity of the sentence is calculated using Eq.7: where: The main reason to opt for polarity analysis is to observe the change in the sentiments aroused from the news articles for the pre-and-peri Covid-19 period.

Change in Health Issues Coverage Printed on the e-Paper's Front Page During the Pre-and-Peri Coronavirus Period Using the LDA Topic Modeling Technique
The issues which are prominent on the front page of the newspaper during the pre-and-peri coronavirus period are illustrating using the Latent Dirichlet Allocation. The analysis of the newspaper is divided into two parts which include the pre-coronavirus period (July 2019 -Dec 2019) and the pericoronavirus period (Jan 2020 -June 2020).
• Pre-Coronavirus Period: • The optimal number of the topic: For the pre-Covid-19 dataset, the best number of topics predicted using topic coherence is 15 as illuminated in Figure 6. Higher coherence will result in the best optimal number of topics. From the graph, it can be observed that there is a drastic increase in the coherence at K=15 and after that, the value of coherence decreases constantly. • Topics Visualization: The visualization of the topics is exhibited in Figure 7 in form of circles and the prevalent terms in bar graphs are illustrating the overall picture of the precoronavirus matters. The overall topics from 1 to 15 covers the following matters labeled as "Political parties", "Criminal Cases", "Economy", "Maharashtra election", "Court Cases", "External matters", "Health Education", "Chidambaram case", "Weather", "Accidents", "Terrorism", "Election", "Chandryaan mission", "CAA bill", "Ram mandir" respectively. The labels of the topics are decided according to the frequent terms occurring in the topics (refer to Table 3). During the pre-COVID-19 period, less emphasis was laid on health-related issues on the front page of the newspapers as the results predicted only one topic related to the field of health field which is labeled as "Health Education". The leading topics during the pre-coronavirus period were politics, crime, and court cases. • Peri-Coronavirus period: • Optimal Number of Topics: The optimal number of topics is represented in Figure 8. From the visualization, it is clear that K=25 is the best number of topics as there is a drastic increase in the topic coherence at K=25. After that value of topic coherence increases but the change is very minor. If the number of topics will be considered higher than 25 then the terms will repeat which will lead to similar topics. • Visualizing Topics: Visualization of topics are illuminated in Figure 9 in the form of circles on the left-hand side and the most prevalent terms on the right-hand side The prevalence of the topic is shown by the area of the circle. Topic 1 has the largest area so it is a highly prevalent topic and the least contribution is of topic 25 as it has the least area of a circle. The topics labels are decided according to the frequent terms occurring in it (refer to Table 4). "Government", "Covid-19 cases", "Economy", "Lockdown", "Medical Facilities" "India-China Conflict", "Criminal Cases", "Indian Cities", "PM Modi"," Political parties", "Migrant Workers", "Tablighi Jamaat", "Court Cases", "COVID-19 outbreak", "Covid-19 Testing", "Delhi government", "International News", Weather", "Covid-19 treatment", "Technology", "Protestors", "Airlines", "Terrorism", "Defense", "Parliament session" these are the labels of topics 1 to 25 respectively. Topic 2, Topic 4, Topic 5, Topic 9, Topic 11, Topic 13, Topic 14, Topic 15, Topic 19 these all represent the Coronavirus related matter which consists of testing, cases, lockdown, vaccine development, plasma treatment, migrant workers during the lockdown, Tablighi jamaat COVID-19 case, health care facilities. Even some part of Topic 1 and Topic 3 discusses the coronavirus situation as it includes the steps taken by the government and the effect on the economy during a lockdown. So from the visualization in which nearly 40% of the topics are covered about COVID-19 and its impacts that indicate the  focus of newspapers have now shifted to warning people about the health-related problems as the situation is very alarming around the globe with the outbreak.

examining the Change in the Sentiments Aroused From News Articles During the Pre-and Peri-Covid-19 era Using Sentiment Analysis
Sentiment analysis is examined using the polarity score of the news articles of the pre-( Figure 10) and peri- (Figure 11) coronavirus period. The sentiments are quantified into three categories named positive, negative, or neutral values. The total polarity score of more than 0 is considered positive, with less than 0 being considered negative, while the polarity score of 0 is considered neutral (Kawathekar and Kshirsagar, 2012). For the pre-coronavirus period, negative polarity is represented by red color diamonds whereas positive polarity is shown by a green-colored rectangle. Neutral polarity is specified by the blue triangle. The main reasons for the positive polarity among the news articles were due to the topics Chandryan mission, Economy, Health Education, Political Parties, and Ram mandir establishment in Ayodhya whereas negative sentiment due to Terrorism, Protests against the CAA bill, Crime, Accidents. The neutral sentiment is invoked by topics labeled as Weather, Maharashtra, and Jharkhand election, and external matters. On the other hand, the negative polarity for the peri-coronavirus period is highlighted with a diamond of mute red color. The positive polarity is pointed by the rectangles of the middle green. The zero polarity is represented using a purple-colored triangle. During the peri-coronavirus period, the negative sentiment is due to the topics related to COVID-19 which include an increase in the tally of active cases, the effect of the lockdown on the economy of India, effects on the daily wage workers. Other factors for the arousal of negative sentiments are the India-China conflict, criminal cases, and protestors. The positive polarity is indicated by the various topics which consist of PM Modi, Medical facilities, Technology, Defense, International news, Delhi government. The neutral sentiment is the result of the topics named Weather, Parliament session.
Further, the total change in the sentiment is carefully observed for the topics of the pre-and-peri COVID-19 periods to analyze the change in sentiments (Figure 12). About 54% of the polarity score is greater than zero for the pre-COVID-19 topics. On the other hand, the graph shows a decrease in the positive polarity score to 46% in the peri-coronavirus period. Similarly, the negative polarity shows a higher value for the peri-coronavirus topics that is near about 53% whereas the pre-coronavirus topics experience a downfall of up to 44%. The neutral polarity score contributes the least for the pre-and-peri coronavirus topics having values of 0.8% and 0.5% respectively. The main cause of the increase in the negative sentiment during the coronavirus period is due to sudden breakdown, increase in deaths, and the restriction imposed during lockdown (refer to Table 5) such as social distancing (Mishra and Majumdar, 2020). From Table 5 it is clear that most negative polarity is due to the Covid-related news. These results implicate the need for formulation of the policies for improving the health sector of India as a lack of awareness can be observed.

DISCUSSION
In this work, we have performed the analysis of the change in health issues coverage through two Indian newspapers for the pre and peri coronavirus period. The front-page news articles for July 2019-June 2020 are collected and the LDA topic modeling technique is applied. The topics interpreted by the LDA topic modeling technique show that less stress was laid on awaring people about the various health issues during the pre-coronavirus period (Fig. 13). News covered during this period was of politics, court cases, crimes, economy, etc. Only 6% of news articles were laying stress on health education. On the other hand, the peri-coronavirus period covers nearly 40% of health-related issues which shows that the emphasis is now laid on warning people. As most of the topics were about the COVID-19 pandemic which includes COVID-19 outbreak, COVID-19 cases, COVID-19 testing, Medical facilities, Lockdown, and also includes various guidelines imposed by the government to overcome the pandemic. These observations implicate that now the health of the people is given more priority as the awareness is being spread and proper actions are also taken to control the pandemic.
The results indicate that health issues are in limelight during critical times on the front page. In both the periods' politics, crime is covered to a greater extent as it is covered under various topics named "Political parties", "Maharashtra election", "Election", "Government", "Delhi Government", "Parliament session", "CAA Bill", "PM Modi". Even for the crime "Criminal cases" is a common topic in both periods (refer to Table 3 and Table 4). Moreover, terrorism-related news is also given priority as it can be found in both the periods under the topic "Terrorism". The media should also give equal priority to the health issues coverage which will ultimately lead to the improvisation of the Indian healthcare system as people will be more aware. The government should formulate some initiatives to improve the health care system of India. Further, the change in sentiments for both periods is observed using the polarity score. The results revealed that the negative sentiment is now prominent in the peri-coronavirus period. About 52% of the polarity score is negative in the peri-coronavirus period whereas it was 46% during the pre-coronavirus time. The main cause of the increase in negative polarity score is due to the restrictions are imposed on the people such as a sudden breakdown of the virus, social distancing, wearing masks, and lockdown period when people were made to live at their homes. The outbreak of the virus is sudden and people are in a state of fear and get mentally disturbed. Since the negative sentiment has increased in the peri-Covid-19 period which implicates that the media should decipher some suitable headlines for covering the spread of the Covid-19 which will lay less mental stress on the people. In the previous work, Gupta and Sinha (2010) inspected health coverage in mass media. The analysis is done manually which takes a lot of time. The limitation of this study is overcome by our study as we have used the LDA topic modeling to carry out the analysis which automatically extracts the topics covered. The approach used by our study is comparatively faster. Liu et al. (2020) analyzed the role of media in the on-going COVID-19 crisis is analyzed. The main drawback of this study was that it covers news related to the COVID-19 and only contains Chinese news articles that were scraped from the WiseSearch mass media. Also, it lacks the sentiment analysis of news articles. The drawbacks of this study are overcome by our research as we have collected news articles from the front page of the e-papers which contains national as well as international news. Further, the news articles collected for our study are from different areas that include politics, business, government, court-cases, etc. Moreover, we have analyzed the change in the sentiments aroused from the news articles.

CONCLUSION
The healthcare sector of India is considered unfit when compared with other country's medical facilities due to the lack of awareness among the public. In this study, we have analyzed the change in health issues coverage through the front page of two Indian newspapers viz. The Hindustan Times and the Times of India for pre-and-peri coronavirus periods. The results show that the newspapers have now focused on warning the people about the pandemic by providing the total number of active cases, providing prevention guidelines, and various other articles are also covered to aware people of the alarming situation that has devastated the whole world. Further, the sentiment analysis of the news articles of the pre-and-peri COVID-19 period shows an increase in the negative sentiment during the peri-coronavirus period which indicates fear due to the sudden breakdown of the pandemic. The theoretical implication of this research includes: the study fills in the gaps in empirical knowledge about the factors that trigger sentiments of the people. It also indicates the psychological effects which implicate how news can bring behavioral change. Moreover, the study proposes practical implications for researchers, health workers, the government who are interested in using the news media for disseminating knowledge about the various health-issues. Further, it will help the government in improving the Indian health sector and to initiate different programs for people to help them come out of this pandemic situation as people are at high risk of getting mental illness. In the future, we would like to combine the various social media platforms such as Facebook, Instagram, Twitter posts, and news articles from different e-papers to analyze the situation. Also, we would like to analyze the state-wise pandemic situation in India. Further, we would like to consider news articles for the whole newspaper rather than focusing on the front page only.

FUNDING AGeNCy
The publisher has waived the Open Access Processing fee for this article.