Classifier Ensemble Methods for Diagnosing COPD from Volatile Organic Compounds in Exhaled Air

Classifier Ensemble Methods for Diagnosing COPD from Volatile Organic Compounds in Exhaled Air

Ludmila Ilieva Kuncheva (School of Computer Science, Bangor University, Bangor Gwynedd, UK), Juan Jose Rodríguez (Escuela Politécnica Superior, Universidad de Burgos, Burgos, Spain), Yasir Iftikhar Syed (Respiratory Department, Prince Philip Hospital, Dafen, Llanelli, UK), Christopher O. Phillips (Welsh Centre for Printing and Coating, College of Engineering, Swansea University, Swansea, UK) and Keir Edward Lewis (Respiratory Department, Prince Philip Hospital, & College of Medicine, Swansea University, Llanelli, UK)
Copyright: © 2012 |Pages: 15
DOI: 10.4018/jkdb.2012040101


The diagnosis of Chronic Obstructive Pulmonary Disease (COPD) is based on symptoms, clinical examination, exposure to risk factors (smoking and certain occupational dusts) and confirming lung airflow obstruction (on spirometry). However, most people with COPD remain undiagnosed and controversies regarding spirometry persist. Developing accurate and reliable automated tests for the early diagnosis of COPD would aid successful management. We evaluated the diagnostic potential of a non-invasive test of chemical analysis (volatile organic compounds - VOCs) from exhaled breath. We applied 26 individual classifier methods and 30 state-of-the-art classifier ensemble methods to a large VOC data set from 109 patients with COPD and 63 healthy controls of similar age; we evaluated the classification error, the F measure and the area under the ROC curve (AUC). The results show that classifying the VOCs leads to substantial gain over chance but of varying accuracy. We found that Rotation Forest ensemble (AUC 0.825) had the highest accuracy for COPD classification from exhaled VOCs.
Article Preview


Chronic Obstructive Pulmonary Disease (COPD) is characterised by airflow limitation, which is not fully reversible. The causes are largely attributed to inhaling tobacco smoke, occupational exposure to dust and chemicals, and indoor and outdoor pollution (Rabe et al., 2007). COPD is a major public health problem and is the only one of the top five causes of death in the first world that is still rising. It is predicted to become the third leading cause of death by 2030, according to a study published by the World Bank/World Health Organization (WHO, 2008), and accounts for much chronic illness and morbidity. Yet, the Global initiative Obstructive Lung Disease (GOLD) report by Rabe et al., (2007) admits that COPD remains relatively unknown or ignored by the public as well as public health and government officials.

The current diagnosis of COPD is based on reported symptoms, patient’s medical history (particularly exposure to risk factors), clinical examination, and then confirming lung air-flow obstruction (spirometry) where Forced Expiratory Volume in 1 second (FEV1) divided by Forced Vital Capacity is less than 0.80 and FEV1 predicted is less than 0.7 . (Rabe et al., 2007)

Developing accurate and reliable automatic tests for early diagnosis of COPD is crucial for disease management as removing risk factors and early inhaled treatments has been shown to prevent progression, chronic ill health and premature death. (Rabe et al., 2007). The current main test, spirometry, is effort dependent and often performed poorly. It can lead to over diagnosis in the young and underdiagnosis in the elderly. Moreover, it has not been validated in ethnic minorities. (Rabe, 2007). The quest for a reliable biomarker in COPD is ongoing.

The smell of breath has long been linked with illness or physical conditions. Can volatile organic compounds (VOCs), measured from the exhaled breath, be used to identify COPD? Following on from Pauling's (1971) initial description of around 200 volatile organic compounds (VOCs) in exhaled breath, the trapping, detection and analysis of breath VOCs have been further developed. VOC analysis has been used to distinguish smokers from non-smokers (van Berkel et al., 2008), recognition of asthma (Ibrahim et al., 2011; Fens et al., 2009), lung cancer (Ulanowska et al., 2011; Machado et al., 2005; Philips et al., 2003; Bajtarevic et al., 2010, Barkar 2006) and tuberculosis (Phillips et al., 2007). Diagnosis of COPD from VOCs has also been attempted (Basanta et al., 2010, Fend et al., 2009, Van Berkel et al., 2010, Philips et al., 2012).

Here we study the diagnostic potential of the chemical signature of the exhaled breath for distinguishing between patients with COPD and healthy controls. We apply a large collection of state-of-the-art classification methods developed within the areas of pattern recognition, machine learning and data mining, with a special focus on classifier ensembles. We applied these methods to the largest data set so far derived from our previous work (Philips et al., 2012). We demonstrate that the ensemble methods are superior to the individual classifier methods, resulting in better classification accuracy, F measure and the area under the ROC curve (AUC).

Complete Article List

Search this Journal:
Open Access Articles
Volume 9: 2 Issues (2019): Forthcoming, Available for Pre-Order
Volume 8: 2 Issues (2018): 1 Released, 1 Forthcoming
Volume 7: 2 Issues (2017)
Volume 6: 2 Issues (2016)
Volume 5: 2 Issues (2015)
Volume 4: 2 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing