Nominal data have always been difficult to use in quantitative analyses. There always has to be some way to compress text into a small number of categories. Under recent development, methods have been devised that can use grammar, syntax, and natural language to quantify information that is locked in text format. The methodologies developed are labeled under the general topic of text mining.
The process of text analysis generally involves the following steps:
Transpose the data so that the observational unit is the identifier and all nominal values are defined in the observational unit.
Tokenize the nominal data so that each nominal value is defined as one token.
Concatenate the nominal tokens into a text string such that there is one text string per identifier. Each text string is a collection of tokens; each token represents a noun.
Use text mining to cluster the text strings so that each identifier belongs to one cluster.
Use other statistical methods to define a natural ranking in the clusters.
Use the clusters defined by text mining in other statistical analyses.