Article Preview
TopIntroduction
Presently, information processing is gradually moving towards semi structured or unstructured data management (Kernochan, 2006). Most data-mining research assumes that the information to be “mined” is already in the form of a relational database. Unfortunately, according to Raymond et al. (2006), in many applications, available electronic information is in the form of unstructured data. Text mining is a technology that makes it possible to discover patterns and trends semi automatically from huge collections of unstructured text. It is based on technologies such as natural language processing, information retrieval, information extraction, and data mining (Andrea et al., 2010). The term text mining was coined to describe tools used to manage textual information. Text mining, known as knowledge discovery in textual databases (Ah-Hwee, 1999) can also be defined as the application of data mining techniques to automated discovery of useful or interesting knowledge from unstructured text (Han et al., 2000). It allows the creation of a technology that combines a human’s linguistic capabilities with the speed and accuracy of a computer. Text mining aims at employing technology to analyze more detailed information in the content of each document and to extract interesting information that can be provided only by multiple documents viewed as whole, such as trends and significant features that may be a trigger to useful actions and decision making (Nasukawa et al., 2001).
Text data mining is a much more complex task than data mining (Ah-Hwee, 1999), because it involves text data that is inherently unstructured and fuzzy. Knowledge discovery in text can be broadly classified into two main phases. Firstly, transformation of (free-form) text documents into an internal or intermediate form and secondly Text mining, which is called knowledge distillation and it is the phase that deduces patterns or knowledge from the intermediate form.
This paper aims at studying particular features of texts, identifying patterns that may be used for making relevant business decision and discussing the tools that may be used for such purpose. In discussing the tools, text mining technique that is based on modification of the GARW algorithm is described (Hany, Dietmar, Nabil, & Fawzy, 2007). Text documents were selected from questionnaires which were administered in order to elicit information towards effective customer relationship management in the mobile phone manufacturing industry.
The Motivations for choosing this domain are that:
- •
Reports on one study showing customer service channels used by 60 firms revealed that information is stored most times in unstructured form (Strauss, El-Ansary, & Frost, 2006).
- •
There exists a challenge within the field of customer relationship management and competitive Intelligence which is not lack of information but the ability to differentiate useful information from chatter or even disinformation and also maximize the richness of these heterogeneous information sources (Solomon et al., 2003).
In Customer relationship management, information is the raw material for decision making (Graham, John, & Nigel, 2004). Effective market decisions are therefore based on sound information and the decisions are not better than the information on which they are based. Information is therefore the lubricant of Business Intelligence. The more information a firm has, the better the value it can provide to each customer and the better the prospects in terms of more accurate, timely and relevant offerings (Strauss et al., 2006).
In this paper therefore we present the use of association rule in Text Mining. These association rules highlight correlations between keywords in the text. Association rules is appropriate for the area of application because they are easy to understand and interpret for top management staff who might be the user of such a system.
The rest of the paper is organized as follows. The second section presents a review of related work, the third section presents the text mining system architecture, implementation and discussion are presented in the fourth section. The fifth section describes the evaluation results while the sixth and seventh sections are the conclusion and the future work respectively.