Analysis of Textual Data Based on Inductive Learning Techniques

Analysis of Textual Data Based on Inductive Learning Techniques

Shigeaki Sakurai (IT Research and Development Center, Toshiba Solutions Corporation, Tokyo, Japan)
Copyright: © 2013 |Pages: 18
DOI: 10.4018/ijirr.2013040103
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

This paper introduces knowledge discovery methods based on inductive learning techniques from textual data. The author argues three methods extracting features of the textual data. First one activates a key concept dictionary, second one does a key phrase pattern dictionary, and third one does a named entity extractor. These features are used in order to generate rules representing relationships between the features and text classes. The rules are described in the format of a fuzzy decision tree. Also, these features are used in order to acquire a classification model based on SVM (Support Vector Machine). The model can classify new textual data into the text classes with high classification accuracy. Lastly, this paper introduces two application tasks based on these methods and verifies the effect of the methods.
Article Preview

Background

Rule discovery methods have been studied since the start of research into artificial intelligence in the field of machine learning. These studies have yielded many techniques, such as decision tree, neural network, genetic algorithm, and association rules, which acquire a rule set from the structured data. A decision tree can describe a rule set in the format of a tree structure. The tree is regarded as the set of IF-THEN rules. C4.5 (Quinlan, 1992) is one example of the algorithms that acquire a compact tree with comparatively high classification accuracy from the structured data. Each item of the data is composed of attribute values and a class. The algorithm uses an information criterion to effectively acquire the tree. A neural network can describe a rule set in the format of a network structure. The network stores relationships between attributes and classes as weights of the links in the network. The weights are appropriately adjusted by the back propagation algorithm. A genetic algorithm (Holland, 1992) inspired by the concept of evolution can acquire a rule set from structured data. The algorithm describes a rule or a rule set as a solution. The algorithm repeatedly improves a solution set to acquire the optimum solution by using three operations: cross-over, mutation, and selection. Association rules (Agrawal & Srikant, 1994) can describe relationships between items. In the case of the retail field, an item is a product item in a receipt. If an item set is frequent, its subsets are frequent. This is called the Apriori property. The association rules can be discovered by expanding small item sets to big item sets including small ones based on the property.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 8: 4 Issues (2018): 1 Released, 3 Forthcoming
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing