Text Mining by Pseudo-Natural Language Understanding

Text Mining by Pseudo-Natural Language Understanding

Ruqian Lu (Chinese Academy of Sciences, China)
Copyright: © 2009 |Pages: 5
DOI: 10.4018/978-1-60566-010-3.ch297
OnDemand PDF Download:


Text mining by pseudo natural language understanding (TM by PNLU for short) is a technique developed by the AST group of Chinese Academy of Sciences, as part of the project automatic knowledge acquisition by PNLU, which introduces a partial parse technique to avoid the difficulty of full NLU. It consists of three parts: PNL design, PNL parser implementation and PNLU based automatic knowledge acquisition. Its essence is twofold: a trade-off between information gain and feasibility of parsing, and a rational work division between human and computer.
Chapter Preview

Main Focus

Definition of PNL

Let’s use the notation PNL for pseudo natural language and PNLU for PNL understanding. The former denotes a class of languages, while the latter denotes a kind of technique for processing PNL. Generally speaking, PNL looks very similar to natural language, but can be understood, analyzed and compiled by computer to an extent by which it can meet the need of some application, for example compiling a text book in a knowledge base for expert consultation.

Design of a PNL

The process of designing a PNL is as follows:

  • 1.

    Determine a set of semantic constructs of the application domain. For example, if the domain is mathematics, then the semantic constructs are case frames providing sentence semantics frequently used in mathematics textbooks, like concept definition, theorem proving, exercise presentation, etc.

  • 2.

    Select a natural language, e.g. English, as background language;

  • 3.

    Look for sentence patterns in this language, whose meaning corresponds to the semantic constructs selected in the first step. These sentence patterns may look like: if * then * is called *; since * is true and * is not true we can infer from * that * is true.

  • 4.

    Organize the set of selected sentence patterns in grammar form and call it the key structure of the language, which is now called pseudo-natural. That means: not every combination of sentence patterns is a legal key structure. All parts marked with stars will be skipped by any PNL parser. We call these parts “don’t care”.

PNL Grammar

As an example, the following tiny grammar implies the key structure of a general classification statement:

〈Classification Sentence〉::=[〈Leading word〉<Don’t care>,]〈classification leading sentence〉

〈Leading word〉::=According to | Based on

〈classification leading sentence〉::=〈Don’t care〉〈classification word〉[〈number〉[main]〈type word〉[,〈sequence of Don’t care〉]] | There〈be word〉〈number〉[main]〈type word〉of〈Don’t care〉. They are〈sequence of Don’t care〉.

〈classification word〉::=〈be word〉classified into | 〈mood word〉be classified into

〈be word〉〉::= is | are

〈type word〉::= classes | types | sorts | kinds ……

This grammar may recognize sentences like: Blood cellsare classified into two types,red blood cells, white blood cells.

Complete Chapter List

Search this Book: