Search for Information in Text Files

Search for Information in Text Files

Mouhcine El Hassani (Sultan Moulay Slimane University, Morocco), Noureddine Falih (Sultan Moulay Slimane University, Morocco) and Belaid Bouikhalene (Sultan Moulay Slimane University, Morocco)
Copyright: © 2020 |Pages: 9
DOI: 10.4018/978-1-7998-1021-6.ch004

Abstract

As information becomes increasingly abundant and accessible on the web, researchers do not have a need to go to excavate books in the libraries. These require a knowledge extraction system from the text (KEST). The goal of authors in this chapter is to identify the needs of a person to do a search in a text, which can be unstructured, and retrieve the terms of information related to the subject of research then structure them into classes of useful information. These may subsequently identify the general architecture of an information retrieval system from text documents in order to develop it and finally identify the parameters to evaluate its performance and the results retrieved.
Chapter Preview
Top

Principle Of Text Mining

Several technical definitions of text mining can be found on the Internet and in textbooks, like that of Un Yong Nahm and all. in “Text Mining with Information Extraction” (Un Yong Nahm and Raymond J, 2002), but the best known is the entire process of looking for patterns related to artificial intelligence that aims to identify association rules from unstructured text. Many methods are based on sorting, grouping (from SQL queries: Structured Query Language) words, and counting the number of repeated words to identify their importance.

To fully understand the principle of finding information, the user expresses his needs in different forms, either a query composed by independent keywords or linked by logical operators such as AND, OR, NO ..., some research applications of information uses them to view indexes of databases containing web pages or collected files.

In general, a text mining process takes place in four steps:

  • We begin by preparing the data for processing, transforming the raw data from one form to another in order to submit them to appropriate operations.

  • The second is the search for frequent patterns in the extracted text and extraction association rules.

  • The third step is to present data in visual form using graphs or diagrams. A data processing tool such as 2D or 3D visualization software would be needed to recognize relevant and useful information.

  • In the fourth phase, both cleaning and optimization operations are applied to reduce the size of the found information.

However, Data processing techniques do not only depend on these four steps but also on the location of the used information and, particular, the used algorithms and methods.

Complete Chapter List

Search this Book:
Reset