Article Preview
TopIntroduction
The goal of this short paper is to demonstrate the power of simple models to fit complex data and to serve as tools for comparison among data sets to inform decision-making. This will be done in the context of data on the number of new cases of covid-19 in New York State (NYS) in the United States. Granville (2020) has demonstrated that population density plays an important role in the way this virus impacts each of the states. We therefore note that, due to the immense difference in population density and behavior, New York City (NYC) is not included in the current data or this analysis. For example, in January of 2020, there were over 19.4 million residents of the 47,126 square mile NYS, including NYC. Of those residents, nearly 43% reside in NYC (World Population Review, 2020) a 305 square mile of the state, making it the most densely populated city in the United States. This population density requires, at the least, different model parameters than would be appropriate for modeling the spread of the virus in the rest of the state. And while we could explicitly build a model in which population density is a factor, we restrict ourselves to a simpler class of models in this analysis.
As we will see, from March 4, 2020 to June 26, 2020, new cases of covid-19 in NYS went through three phases, each of which can be modeled using a discrete-time dynamical system. We will take the population of NYS at the start of the pandemic as 11,103,683 (19,440,500 in the entire state, minus 8,336,817 in NYC proper.) The population figures given here are based on World Population Review (2020) and the Wikipedia entry on New York (2020). We also note that while many more sophisticated models could be developed to account for a variety of factors, we demonstrate that a high degree of fidelity can be accomplished using simple models.
After a brief overview of the discrete SIR model, we discuss some of the issues with the data regarding coronavirus cases in NYS. We then discuss three measures of fitness for comparing the model to the data and use both Excel’s SOLVER and the OPTIM function in R to train the model. The analysis and implications of the models are discussed, and then the trained model is used to analyze the caseload data for additional states in the United States and in the country as a whole.