Cardiovascular Risk Detection Through Big Data Analysis

Cardiovascular Risk Detection Through Big Data Analysis

Miguel A. Sánchez-Acevedo, Zaydi Anaí Acosta-Chi, Ma. del Rocío Morales-Salgado
Copyright: © 2020 |Pages: 11
DOI: 10.4018/IJBDAH.2020070101
Article PDF Download
Open access articles are freely available for download


Cardiovascular diseases are the main cause of mortality in the world. As more people suffer from diabetes and hypertension, the risk of cardiovascular disease (CVD) increases. A sedentary lifestyle, an unhealthy diet, and stressful activities are behaviors that can be changed to prevent CVD. Taking measures to prevent CVD lowers the cost of treatments and reduces mortality. Data-driven plans generate more effective results and can be applied to groups with similar characteristics. Currently, there are several databases that can be used to extract information in real time and improve decision making. This article proposes a methodology for the detection of CVD and a web tool to analyze the data more effectively. The methodology for extracting, describing, and visualizing data from a state-level case study of CVD in Mexico is presented. The data is obtained from the databases of the National Institute of Statistics and Geography (INEGI) and the National Survey of Health and Nutrition (ENSANUT). A k-nearest neighbor (KNN) algorithm is proposed to predict missing data.
Article Preview


According to the World Health Organization (WHO, 2018), ischaemic heart disease and stroke, are the main cause of death in the world; the organization estimates that 15.2 million people died due to those diseases in 2016. The main factors of cardiovascular disease (CVD) are hypertension, dyslipidemia, abnormal glucose levels and diabetes (Anstiss & Passmore, 2020). Despite its high mortality, CVD can be prevented through behavioral changes as physical activity, healthy diet, limiting tobacco exposition, restricting alcohol consumption, and reducing stress (Hooker, 2013). Changing the lifestyle of people is a big challenge due to the increase in sedentary work and the consumption of high calorie foods, so dedicated attention and training for CVD prevention is required (Saeed et al., 2018). Plans and actions to reduce the risk of CVD can be improved by analyzing the data generated every day from the treatment of people with CVD cases, including their origin and evolution.

Multiple databases are fed with CVD data every day around the world. Centers for Disease Control and Prevention (CDC, 2020), National Survey for Health and Nutrition (ENSANUT, 2020), National Cardiovascular Disease Database (NCVD, 2020), and World Health Organization (WHO, 2020) are some organizations that keep databases available online. Data stored in public databases can be supplemented with patient records, clinical analyzes, diagnoses, and data collected from mobile devices that people use to track their treatments. New correlations between data can be discovered through big data analysis tools. Big Data analysis can help identify common characteristics in people with CVD, and those results could help select actions of greater impact that could be carried out in countries where the lack of resources restricts the amount of clinical analysis that can be performed in the population.

The term Big Data was coined by Gartner Group in 2008 (Emmanuel & Stanier, 2016) for referring to the large amount (volume) of data that is generated continuously (velocity) from several sources (variety) every day (Edward & Sabharwal, 2015). To analyze the collected data, an exploration is performed, followed by a description, and then visualized to interactively explore the content; in this way, users can identify patterns and infer correlations to support decisions (Bikakis, Papastefanatos, & Papaemmanouil, 2019). Good results in the analysis phase can be guaranteed with quality of data. Quality is ensured by evaluating the accuracy, completeness, consistency, distinctness, precision, timeliness and volume of data (Cappiello, Samá, & Vitali, 2018) . After data preparation, a set of analysis techniques can be applied to discover relevant information; most used techniques are: pattern matching, classification, clustering, and regression (Mogha, Ahlawat, & Singh, 2018). All the stages involved in big data processing allow the generation of new insights that could be used to take actions for solving the problem under study.

According to Hong et al. (2018) Big Data in healthcare can be divided in four categories: medicine and clinics, public health and behavior, medical experiment, and medical literature. In the first class the data is extracted from electronic health records (Hoffman, 2016), electronic medical records (Setiawan, Utami, Mengko, & Indrayanto, 2014), personal health records (Okore, Bakyarani, & Phil, 2015) and medical images (Tahmassebi et al., 2019). Data of second class is collected from electrocardiograms (Cipresso, Rundo, Conoci, & Parenti, 2019) and vital signs (Mohammad Forkan, Khalil, & Atiquzzaman, 2017). The third class obtains data from molecular biology (Cannataro, 2019), human body samples, and clinical trials (Mayo et al., 2017). Finally, medical literature is obtained from structured knowledge, journal and conference articles (Wang et al., 2018).

Complete Article List

Search this Journal:
Volume 9: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 8: 1 Issue (2023)
Volume 7: 1 Issue (2022)
Volume 6: 2 Issues (2021)
Volume 5: 2 Issues (2020)
Volume 4: 2 Issues (2019)
Volume 3: 2 Issues (2018)
Volume 2: 2 Issues (2017)
Volume 1: 1 Issue (2016)
View Complete Journal Contents Listing