This chapter describes the application of a number of text mining techniques to discover patterns in the health insurance schedule with an aim to uncover any inconsistency or ambiguity in the schedule. In particular, we will apply first a simple “bag of words” technique to study the text data, and to evaluate the hypothesis: Is there any inconsistency in the text description of the medical procedures used? It is found that the hypothesis is not valid, and hence the investigation is continued on how best to cluster the text. This work would have significance to health insurers to assist them to differentiate descriptions of the medical procedures. Secondly, it would also assist the health insurer to describe medical procedures in an unambiguous manner.
Australian Health Insurance System
In Australia, there is a universal health insurance system for her citizens and permanent residents. This publicly-funded health insurance scheme is administered by a federal government department called the Health Insurance Commission (HIC). In addition, the Australian Department of Health and Ageing (DoHA), after consultation with the medical fraternity, publishes a manual called Medicare Benefit Schedule (MBS) in which it details each medical treatment procedure and its associated rebate to the medical service providers who provide such services. When a patient visits a medical service provider, the HIC will refund or pay the medical service provider at the rate published in the MBS1 (the MBS is publicly available online from http://www.health.gov.au/pubs/mbs/mbs/css/index.htm).
Therefore, the description of medical treatment procedures in the MBS should be clear and unambiguous to interpretation by a reasonable medical service provider as ambiguities would lead to the wrong medical treatment procedure being used to invoice the patient or the HIC. However, the MBS has developed over the years, and is derived through extensive consultations with medical service providers over a lengthy period. Consequently, there may exist inconsistencies or ambiguities within the schedule. In this chapter, we propose to use text mining methodologies to discover if there are any ambiguities in the MBS.
The MBS is divided into seven categories, each of which describes a collection of treatments related to a particular type, such as diagnostic treatments, therapeutic treatments, oral treatments, and so on. Each category is further divided into groups. For example, in category 1, there are 15 groups, A1, A2, …, A15. Within each group, there are a number of medical procedures which are denoted by unique item numbers. In other words, the MBS is arranged in a hierarchical tree manner, designed so that it is easy for medical service providers to find appropriate items which represent the medical procedures provided to the patient.2 This underlying MBS structure is outlined in Figure 1.
An overview of the MBS structure in the year of 1999
This chapter evaluates the following:
Hypothesis — Given the arrangement of the items in the way they are organised in the MBS (Figure 1), are there any ambiguities within this classification? Here, ambiguity is measured in terms of a confusion table comparing the classification given by the application of text mining techniques and the classification given in the MBS. Ideally, if the items are arranged without any ambiguities at all (as measured by text mining techniques), the confusion table should be diagonal with zero off diagonal terms.
Optimal grouping — Assuming that the classification given in MBS is ambiguous (as revealed in our subsequent investigation of the hypothesis), what is the “optimal” arrangement of the item descriptions using text mining techniques (here “optimal” is measured with respect to text mining techniques)? In other words, we wish to find an “optimal” grouping of the item descriptions together such that there will be a minimum of misclassifications.
The benefits of this work are as follows: