Text mining is an instrumental technology that today’s organizations can employ to extract information and further evolve and create valuable knowledge for more effective knowledge management. It is also an important tool in the arena of information systems security (ISS). While a plethora of text mining research has been conducted in search of revamped technological developments, relatively limited attention has been paid to the applicable insights of text mining in ISS. In this chapter, we address a variety of technological applications of text mining in security issues. The techniques are categorized according to the types of knowledge to be discovered and the text formats to be analyzed. Privacy issues of text mining as well as future trends are also discussed.
Defined as “the discovery by computer of new, previously unknown, information by automatically extracting information from different written resources”(Fan et al. 2006), text mining is an emerging technology characterized by a set of technological tools which allow for the extraction of unstructured information from text. With the exponential growth of the internet, it is literally cumbersome for individuals as well as companies to process all the overwhelmed information. Not like some data mining techniques discovering knowledge from only the structured data, such as numeric data, text mining is related to finding knowledge from the unstructured textual data including e-mails, Web pages, business reports, and articles, etc. Leaping from old-fashioned information retrieval to information and knowledge discovery, text mining applies the same analytical functions of data mining to the domain of textual information and replies on sophisticated text analysis techniques that distill information from free-text documents (Dörre et al. 1999).
As voluminous corporate information must be merged and managed and the dynamic business environment pushes decision makers to promptly and effectively locate, read, and analyze relevant documents to produce the most informative decisions, discovering hidden patterns from the structured data plays an important role in business where patterns are paramount for strategic decision making. Text mining pursues knowledge discovery from textual databases by isolating key bits of information from large amounts of text, by identifying relationships among documents, and by inferring new knowledge from them (Durfee 2006). Furthermore, (Fan et al. 2006) indicated that the key to text mining is creating technology that combines a human’s linguistic capabilities with the speed and accuracy of a computer. Gluing the generic process model for text-mining application proposed by (Fan et al. 2006) and general text mining framework suggested by (Durfee 2006), we think that the following model can capture the processes involved in text mining from text collection and distillation to knowledge representation (see Figure 1).
Processes involved in text mining(Adapted from (Durfee 2006; Fan et al. 2006)
Key Terms in this Chapter
Web Link Analysis: is based on hyperlink structure and is used to discover hidden relationships among communities.
Authorship Characterization: Attempts to formulate an author profile by making inferences about gender, education, and cultural backgrounds on the basis of writing style.
Web Content Analysis: Is the systematic study of site content
Abnormal Detection: Tries to find out objects which appear to be inconsistent with the remainder of the object set.
Social Network Analysis: is the study of mathematical models for interactions among people, groups, and objects.
Authorship Identification: Deals with attributing authorship of unidentified writing on the basis of stylistic similarities between the author’s known works and the unidentified piece; it deals with classification problems.
Text Mining: is the discovery by computer of new, previously unknown, information by automatically extracting information from different written resources.