Maximal Sequential Patterns: A Tool for Quantitative Semantic in Text Analysis

Maximal Sequential Patterns: A Tool for Quantitative Semantic in Text Analysis

René Arnulfo García-Hernández (Autonomous University of the State of Mexico, Mexico), J. Fco. Martínez-Trinidad (National Institute of Astrophysics, Mexico) and J. Ariel Carrasco-Ochoa (National Institute of Astrophysics, Mexico)
DOI: 10.4018/978-1-60960-881-1.ch010
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

This chapter introduces maximal sequential patterns, how to extract them, and some applications of maximal sequential patterns for document processing and web content mining. The main objective of this chapter is showing that maximal sequential patterns preserve document semantic, and therefore they could be a good alternative to the word and n-gram models. First, this chapter introduces the problem of maximal sequential pattern mining when the data are sequential chains of words. After, it defines several basic concepts and the problem of maximal sequential pattern mining in text documents. Then, it presents two algorithms proposed by the authors of this chapter for efficiently finding maximal sequential patterns in text documents. Additionally, it describes the use of maximal sequential patterns as a quantitative semantic tool for solving different problems related to document processing and web content mining. Finally, it shows some future research directions and conclusions.
Chapter Preview
Top

Background

In the last decades, there has been a rapid growth of information stored in electronic devices. In (Leavit, 2002) the author reported that about the 20 percent of the electronic information in the companies is stored in structured databases, where objects or registers are easily accessible. This situation motivated the interest for analyzing information stored in this kind of databases. A research area that focuses on the analysis of information stored in structured databases is the Knowledge Discovery defined in (Fayad, Piatetsky-Shapiro, & Padhraic, 1996) as “the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data”. These patterns should be easily understandable by the user. The key step in the process of knowledge discovery in databases is Datamining, which following to (Fayad, Piatetsky-Shapiro, & Padhraic, 1996) is “a step in the Knowledge Discovery in Databases process that consists in applying data analysis and discovery algorithms that produce a particular enumeration of patterns (or models) over the data”.

Complete Chapter List

Search this Book:
Reset