Searching and Mining with Semantic Categories

Searching and Mining with Semantic Categories

Brahim Djioua (University of Paris-Sorbonne, France), Jean-Pierre Desclés (University of Paris-Sorbonne, France) and Motasem Alrahabi (University of Paris-Sorbonne, France)
DOI: 10.4018/978-1-4666-0330-1.ch006
OnDemand PDF Download:
No Current Special Offers


A new model is proposed to retrieve information by building automatically a semantic metatext1 structure for texts that allow searching and extracting discourse and semantic information according to certain linguistic categorizations. This paper presents approaches for searching and mining full text with semantic categories. The model is built up from two engines: The first one, called EXCOM (Djioua et al., 2006; Alrahabi, 2010), is an automatic system for text annotation, related to discourse and semantic maps, which are specification of general linguistic ontologies founded on the Applicative and Cognitive Grammar. The annotation layer uses a linguistic method called Contextual Exploration, which handles the polysemic values of a term in texts. Several ‘semantic maps’ underlying ‘point of views’ for text mining guide this automatic annotation process. The second engine uses semantic annotated texts, produced previously in order to create a semantic inverted index, which is able to retrieve relevant documents for queries associated with discourse and semantic categories such as definition, quotation, causality, relations between concepts, etc. (Djioua & Desclés, 2007). This semantic indexation process builds a metatext layer for textual contents. Some data and linguistic rules sets as well as the general architecture that extend third-party software are expressed as supplementary information.
Chapter Preview

Semantic Search Engine Or Question-Answering System?

It is always obvious to declare that traditional search engines deal with terms for the index organization and numbers for the quantity of documents indexed and provided to a search. And it is usual to assimilate a system, which identify specific information not provided with keyword queries, as a question-answering system. But in the standard information retrieval paradigm, in which the user provided with a ranked list of references to documents thought to contain information needed, it requires the user to search through the documents to satisfy his needs. Another approach to meting user’s information need in a more focused way is to provide specific answers to specific questions. Search engines as information retrieval per excellence, can be thought of allowing users to satisfy information needs. The main limitation of this paradigm is that it requires user’s involvement to identify the information they require: they must (1) express their needs by keywords and (2) must read through the documents to find the information they were looking for.

Information retrieval on the Web today makes little use of NLP processing. The perceived value of improved understanding is greatly outweighed by practical difficulty of storing complex linguistic annotations in a scalable indexing and search framework. Linguistics can help to identify automatically textual categorizations, organized as points of view of text mining, which can satisfy user’s needs. Our search engine tries to take advantage of both classical IR and QA systems. It acts like a classic search engine in which, a user formulates a query with terms and semantic categories and the IR systems answers by providing a list of references for documents containing textual segments (sentences, paragraphs, …), identified as discourse and semantic relations (causality, definition, quotation, …). It is identifiable to QA systems, by providing precise information and the user does not have to explore the document contents to satisfy his targeted needs, but our system does not use any knowledge database and does not process user’s queries as natural language expression.

Complete Chapter List

Search this Book: