Predictive Modeling of Surgical Site Infections Using Sparse Laboratory Data

Predictive Modeling of Surgical Site Infections Using Sparse Laboratory Data

Prabhu RV Shankar (University of California Davis, Sacramento, USA), Anupama Kesari (Department of Information Science and Engineering, Sri Jayachamarajendra College of Engineering, Mysuru, India), Priya Shalini (Department of Information Science and Engineering, Sri Jayachamarajendra College of Engineering, Mysuru, India), N. Kamalashree (Department of Information Science and Engineering, Sri Jayachamarajendra College of Engineering, Mysuru, India), Charan Bharadwaj (Sri Jayachamarajendra College of Engineering, Mysuru, India), Nitika Raj (Sri Jayachamarajendra College of Engineering, Mysuru, India), Sowrabha Srinivas (Sri Jayachamarajendra College of Engineering, Mysuru, India), Manu Shivakumar (State University of New York at Buffalo, Buffalo, USA), Anand Raj Ulle (Department of Information Science and Engineering, Sri Jayachamarajendra College of Engineering, Mysuru, India) and Nagabhushana N. Tagadur (Sri Jayachamarajendra College of Engineering, Mysuru, India)
Copyright: © 2018 |Pages: 14
DOI: 10.4018/IJBDAH.2018010102

Abstract

As part of a data mining competition, a training and test set of laboratory test data about patients with and without surgical site infection (SSI) were provided. The task was to develop predictive models with training set and identify patients with SSI in the no label test set. Lab test results are vital resources that guide healthcare providers make decisions about all aspects of surgical patient management. Many machine learning models were developed after pre-processing and imputing the lab tests data and only the top performing methods are discussed. Overall, RANDOM FOREST algorithms performed better than Support Vector Machine and Logistic Regression. Using a set of 74 lab tests, with RF, there were only 4 false positives in the training set and predicted 35 out of 50 SSI patients in the test set (Accuracy 0.86, Sensitivity 0.68, and Specificity 0.91). Optimal ways to address healthcare data quality concerns and imputation methods as well as newer generalizable algorithms need to be explored further to decipher new associations and knowledge among laboratory biomarkers and SSI.
Article Preview

Laboratory Tests In The Era Of Big Data

One of the time-tested resource that has played a significant role in helping providers make decisions about choosing patients, readiness for surgery as well as intra and post-operative management are laboratory test results, such as blood counts, blood sugars including metabolic panels and inflammatory markers. These lab test results are indispensable for all aspects of patient management. The information content in lab test results could be a valuable resource to decipher unknown associations between various post-operative complications such as SSI.

Widespread adoption of Health Information Systems (HIS) including Electronic Health Records (EHR) and laboratory Information Systems has generated massive amounts of both quantitative data (e.g., laboratory values) and qualitative data (e.g., text based clinical notes), partly fueled by funding, by the Health Information Technology for Economic and Clinical Health Act of 2009. It is but inevitable that the big data management and analytics that are successfully applied in retail sales and other domains will be researched in healthcare (Murdoch & Detsky, 2013). Recent advances in computational machine learning methods and predictive modeling could be applied to analyze the information content of those lab tests and draw valuable knowledge from any associations with SSI (Sohn et al., 2017). But, clinical laboratory data (and most healthcare data) is inherent with various constraints that need to be addressed and pre-processed before applying machine learning algorithms. The sampling of lab tests is purely based on the clinical condition of the patient, as assessed by the providers, and thus the temporality of sampling is extremely variable in time/ space as well as the type of tests done. Some tests are usually done just once (e.g., ABO blood grouping) and some others are repeated multiple times (e.g., glucose, electrolytes, hemoglobin). Other than in research settings, only a core set of lab tests are done routinely on most patients during day to day clinical care and a subset of patients may be tested for disease specific lab biomarkers. This variability of sampling, both the type/ number of tests as well as temporality (time and space) contrasts with the statistical and predictive analytic tools which very much rely on systematic and regularly sampled data.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 4: 2 Issues (2019): Forthcoming, Available for Pre-Order
Volume 3: 2 Issues (2018)
Volume 2: 2 Issues (2017)
Volume 1: 1 Issue (2016)
View Complete Journal Contents Listing