A Technique to Exploit Free-Form Notes to Predict Customer Churn

A Technique to Exploit Free-Form Notes to Predict Customer Churn

Gregory W. Ramsey, Sanjay Bapna
DOI: 10.4018/ijcmam.2014010101
(Individual Articles)
No Current Special Offers


As healthcare costs rise, hospitals are seeking ways to improve operations. This paper examines the usefulness of free-form notes to solve a classification problem commonly associated with customer churn. The authors show that classifiers which incorporate free-form notes, using natural language processing techniques, are up to 9% more accurate than classifiers that are solely developed using structured data. The authors suggest that hospitals and chronic disease management clinics can use structured data and free-form notes from electronic health records to predict which patients are likely to cease receiving care from their facilities. Classification tools for predicting patient churn are of interest to hospital administrators; such information can aid in resource planning and facilitate smoother handoffs between care providers.
Article Preview


Customers who cease using a firm’s service or product, also referred to as customer churn, end their paying relationship with the firm for voluntary or involuntary reasons (Lin, Tzeng, & Chin, 2011). This issue is of great concern to commercial entities because acquiring and retaining customers is costly for most businesses (Applebaum, 2001). Hospitals and other healthcare facilities face a similar problem with patient churn but instead of concerns for costs of acquiring patients these facilities need to anticipate churn to help in resource planning, to help with patient handoffs between providers of care, and to coordinate patient follow-ups.

Modeling and predicting churn have been studied using a number of modeling techniques, such as logistic regression, neural networks, decision trees, and others (e.g., Burez & Van den Poel, 2009; Coussement & Van den Poel, 2008; Jiayin, Yangming, Yingying, & Shuang, 2006; Neslin, Gupta, Kamakura, Lu, & Mason, 2006; Pendharkar, 2009; Verbeke, Martens, Mues, & Baesens, 2011). The majority of these models are constructed using structured data, data which is typically maintained within corporate and/or administrative databases. Approximately 85% of data within firms is either unstructured or semi-structured (Negash, 2004). A distinction is made between unstructured data (e.g., a free-form note that is captured without any associated context) and semi-structured data (e.g., a free-form note that is captured and associated with a particular patient’s account, thus establishing a context) (Negash, 2004). Semi-structured data is voluminous within the healthcare environment (Berg, 2001; Miettinen & Korhonen, 2008; Srinivas, Rani, & Govrdhan, 2010). Some churn models based on structured data are briefly discussed in this paper, however, the focus of this paper is on investigating the usefulness of including semi-structured data (free-flowing text) in models for identifying and determining the likelihood of recipients of services to churn.

The paper is organized in the following manner. First, a literature review is presented in three parts, which are: (a) discussion of types and issues associated with patient churn, (b) identification of some of the more popular methods for modeling churn, and (c) a presentation of text mining techniques that will be applied to develop a churn model based on semi-structured data. Second, the methodology for developing and testing different churn models is reported. Third, results of a series of prediction experiments are presented. Finally, in the discussion and conclusion section, the findings from the experiments applied to possible real world opportunities for predicting patient churn are presented

Complete Article List

Search this Journal:
Volume 4: 2 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing