Lifespan Prediction Using Socio-Economic Data Using Machine Learning

Lifespan Prediction Using Socio-Economic Data Using Machine Learning

Veysel Gökhan Aydin, Elif Bulut
DOI: 10.4018/978-1-6684-4045-2.ch002
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Average life expectancy may change among different regions within the same society as well as among countries. In this study, a multiple linear regression model and a support vector regression model were established by addressing some economic and social variables of the countries. The data of 32 countries for the years 2017 and 2018 was compiled within the scope of the study, and it was attempted to determine which model was better. The aim of this study is to compare the prediction performances of support vector regression and multiple linear regression analyses. Support vector regression analysis was applied by the use of radial basis functions, linear, polynomial, and sigmoid kernel functions. In addition, the multiple linear regression analysis method was also applied using the least squares method, and the results were compared. For the comparison of the results, error bound accuracy rates were calculated, and the comparison was made according to these rates. The predictions were also examined through graphical methods, and it was attempted to determine the best model.
Chapter Preview
Top

Introduction

Regression analysis, in its most basic form, is a type of analysis used to determine whether or not there is a relationship between the dependent variable(s) and the independent variable(s), as well as the direction and strength of the relationship. Thus, the relationships among the variables are defined, and prediction is therefore able to be performed (Özdamar, 2013).

In the literature, there are numerous types of regression analysis. Regression analysis is able to get different names depending on the number of dependent and independent variables or on the structure of the data (Özdamar, 2013).

Today, there are different regression models using the more developed machine learning methods etc. as well as regression models using classical statistical methods. Along with the development of technology, new regression techniques that can perform much better predictions than the classic regression models have been developed and are continuing to be developed today.

Average life expectancy differs as per the countries’ economic and social development levels. In general, while developed countries have a longer average life expectancy, less developed countries have a relatively shorter average life expectancy (Jetter, Laudage & Stadelmann, 2019). In particular, when it is accepted that economic growth directly affects the average life expectancy, the average life lengthens as economic growth increases. Similarly, economic growth means an increase in income level and an increase in development through investments in the domain of health. Variables such as the natural conditions of the geography in which people live, and their habits as well as economic growth can all have an impact on average life. It is known that air pollution and habits like consumption of tobacco products, alcohol etc. also have a negative effect on human health.

Average life expectancy may change among different regions within the same society as well as among countries. In this study, a multiple linear regression model and a support vector regression model were established by addressing some economic and social variables of the countries. The data of 32 countries for the years 2017 and 2018 was compiled within the scope of the study, and it was attempted to determine which model was better. There are similar studies with respect to average life, and examination was performed often by the use of a multiple linear regression model and less independent variables. The said studies were generally applied to smaller datasets.

A significant change in the average life expectancy of the countries is not expected within a year. On the other hand, some of the independent variables may cause significant change year after year. In this study, it was intended for the model to be more sensitive due to very small changes in the dependent variables in the face of the average change in independent variables in the models established by the compilation of the data over two years.

The main purpose of this study is to compare the prediction performances of support vector regression and multiple linear regression analyses. For this purpose, average life and socio-economic data for 2 years were compiled from the databases of The Organisation for Economic Co-operation and Development (OECD) and the World Bank. Support vector regression analysis was applied by the use of radial basis functions, linear, polynomial, and sigmoid kernel functions. In addition, the multiple linear regression analysis method was also applied using the least squares method, and the results were compared. For the comparison of the results, error bound accuracy rates were calculated, and the comparison was made according to these rates. The predictions were also examined through graphical methods, and it was attempted to determine the best model.

In the methods section, the mathematical backgrounds of the methods used in the research are explained. Support vector regression was studied using four different kernel structures. Eight different methods were used to compare the methods. The formulas of the comparison methods are stated in the relevant section. The source of the data, the number of observations and information about the variables are explained in the data description section. Afterwards, the findings obtained as a result of the research were examined. Descriptive statistics about the data set were shared and the analysis results were examined comparatively. The findings obtained in the conclusion and recommendations section were evaluated and suggestions were made for similar studies.

Complete Chapter List

Search this Book:
Reset