Text Mining and Patient Severity Clusters

Text Mining and Patient Severity Clusters

Patricia Cerrito (University of Louisville, USA)
DOI: 10.4018/978-1-60566-752-2.ch008
OnDemand PDF Download:
No Current Special Offers


Text mining diagnosis codes takes advantage of the linkage across patient conditions instead of trying to force the assumption of independence. Combinations of diagnoses are used to define groups of patients. For example, patients with diabetes have a high probability of heart disease and kidney failure compared to the general population. Instead of relying on these three conditions and assuming that the general population is just as likely to acquire them in combination, text mining examines the combinations of diabetes, diabetes with kidney failure, diabetes with heart failure, and diabetes with both conditions.
Chapter Preview


Nominal data have always been difficult to use in quantitative analyses. There always has to be some way to compress text into a small number of categories. Under recent development, methods have been devised that can use grammar, syntax, and natural language to quantify information that is locked in text format. The methodologies developed are labeled under the general topic of text mining.

The process of text analysis generally involves the following steps:

  • 1.

    Transpose the data so that the observational unit is the identifier and all nominal values are defined in the observational unit.

  • 2.

    Tokenize the nominal data so that each nominal value is defined as one token.

  • 3.

    Concatenate the nominal tokens into a text string such that there is one text string per identifier. Each text string is a collection of tokens; each token represents a noun.

  • 4.

    Use text mining to cluster the text strings so that each identifier belongs to one cluster.

  • 5.

    Use other statistical methods to define a natural ranking in the clusters.

  • 6.

    Use the clusters defined by text mining in other statistical analyses.

Complete Chapter List

Search this Book: