Concept-Based Mining Model

Concept-Based Mining Model

Shady Shehata (University of Waterloo, Canada), Fakhri Karray (University of Waterloo, Canada) and Mohamed Kamel (University of Waterloo, Canada)
DOI: 10.4018/978-1-60566-908-3.ch004
OnDemand PDF Download:
List Price: $37.50


Most of text mining techniques are based on word and/or phrase analysis of the text. Statistical analysis of a term frequency captures the importance of the term within a document only. However, two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. Thus, the underlying model should indicate terms that capture the semantics of text. In this case, the model can capture terms that present the concepts of the sentence, which leads to discover the topic of the document. A new concept-based mining model that relies on the analysis of both the sentence and the document, rather than, the traditional analysis of the document dataset only is introduced. The concept-based model can effectively discriminate between non-important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning. The proposed model consists of concept-based statistical analyzer, conceptual ontological graph representation, and concept extractor. The term which contributes to the sentence semantics is assigned two different weights by the concept-based statistical analyzer and the conceptual ontological graph representation. These two weights are combined into a new weight. The concepts that have maximum combined weights are selected by the concept extractor. The concept-based model is used to enhance the quality of the text clustering, categorization and retrieval significantly.
Chapter Preview


Typical text mining tasks include but not limited to text clustering, text categorization and document retrieval. Clustering is unsupervised learning paradigm where clustering methods try to identify inherent groupings of the text documents so that a set of clusters are produced in which clusters exhibit high intra-cluster similarity and low inter-cluster similarity. Generally, text document clustering methods attempt to segregate the documents into groups where each group represents some topic that is different than those topics represented by the other groups (Aas & Eikvil 1999; Salton, Wong, & Yang 1975; Salton & McGill 1983).

Complete Chapter List

Search this Book: