Data Mining for Predicting Pre-diabetes: Comparing Two Approaches

Data Mining for Predicting Pre-diabetes: Comparing Two Approaches

Kambiz Farahmand (NDSU, Fargo, ND, USA), Guangjing You (NDSU, Fargo, ND, USA), Jing Shi (UC, Oakland, CA, USA) and Satpal Singh Wadhwa (NDSU, Fargo, ND, USA)
Copyright: © 2015 |Pages: 21
DOI: 10.4018/IJUDH.2015070103


Many individuals who are at risk for type 2 diabetes do not experience symptoms of diabetes, and therefore are not aware of this condition. Screening for type 2 diabetes can identify individuals at risk for type 2 diabetes, and prevent or delay complications. A total of 13 risk factors, out of 17 variables of NHANES', were selected as predictors. In this study, a comparison of two data mining methodology showed that Decision Tree has a higher ROC index than Logistic Regression modeling. All ROC indexes for two data mining models were greater than 77% indicating both methods present a good prediction for pre-diabetes. The final results of comparison indicated Decision Tree modeling is a better indicator to predict pre-diabetes.
Article Preview


Diabetes is the fastest growing chronic disease in the world. In the United States, according to the Centers for Disease Control and Prevention (CDC) Diabetes report (2014), there were more than twenty-nine million people or 9.3% the U.S. population who had diabetes in 2012. From which, twenty-one million were diagnosed, and 8.1 million with diabetes were undiagnosed. Diabetes is a common chronic disease, which occurs when the pancreas does not produce enough insulin, or when the body cannot effectively use the insulin it produces. This leads to an increased concentration of glucose in the blood. There are two main types of diabetes:

  • Type 1 Diabetes Mellitus: When most or all insulin producing beta cells in the pancreas have been destroyed, so there is a severe lack of insulin in the body.

  • Type 2 Diabetes Mellitus: When the pancreas still produces insulin but body cannot use insulin properly.

Type 1 diabetes often happens in children and adolescents. However, type 2 diabetes is the most common form of diabetes. In adults, type 2 diabetes accounts for about 90% to 95% of all diagnosed cases of diabetes. Patients with type 2 diabetes require long-term health management plans (ADA, 2013). According to the statistics of CDC (2014), $ 245 million were used for the total costs and lost work and wages for people with diagnosed diabetes. From these numbers, one can see that type 2 diabetes has significant financial impact. In this study, the assumption is the type 2 diabetes if a particular kind of diabetes is not mentioned.

Data from the National Diabetes Statistics report (2014), 86 million American age greater and equal 20 years had pre-diabetes in 2012. It is mean that more than 1 out of 3 adults have pre-diabetes. A person with pre-diabetes who has a blood sugar level higher than normal, but not high enough for a diagnosis of diabetes. These individuals are at higher risk for developing type 2 diabetes and other serious health problems, including heart disease, and stroke. Without lifestyle changes to improve their health, 15% to 30% of people with pre-diabetes will develop type 2 diabetes within five years. But, 9 out of 10 adults do not know who have pre-diabetes. Therefore, identifying individuals at high risk for pre-diabetes is an urgent need.

Effective diabetes screening could improve people’s quality of life and reduce the cost of health care system. Screening should be sequential, not a one-time event. However, when and how to screen asymptomatic individuals is a complex decision. In order to group the patients who have the same condition and make a screening schedule for same group. Based on these requirement, how to accurately predict and diagnose diabetes or pre-diabetes are vital for healthcare system.

The objective of this study is compare qualitative models in data mining for pre-diabetes. Data mining is the processing of analyzing large-scale data in order to descript, understand and predict trends in the data. This is the reason why data mining technologies were used to analyze the constantly increasing volumes of data for diabetes.


Literature Review

In the type 2 diabetes screening criteria, a total of six guidelines (ADA, WHO, HIS, VA/DoD, ICSI and IDF) indicated the variable “BMI ≥25 kg/m2” was the risk factor for screening diabetes. A total of five guidelines indicated age as the predictor for screening diabetes, from which, ADA, VA/DoD and IDF recommend individuals age ≥45 years should be screened, WHO suggested individuals age ≥35 years should be screened, and CMS individuals age ≥65 years should be screened. There are 8 guidelines indicating variable “hypertension (>140/90)” was the risk factors for screening diabetes. And VA/DoD indicated the variable “HDL cholesterol <40 mg/dl” was the risk factors for screening diabetes, and the other seven guidelines indicating the variable “HDL cholesterol <35 mg/dl” was the predictors for screening diabetes.

Complete Article List

Search this Journal:
Open Access Articles
Volume 8: 2 Issues (2018)
Volume 7: 2 Issues (2017)
Volume 6: 2 Issues (2016)
Volume 5: 2 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing