Protoforms of Linguistic Database Summaries as a Human Consistent Tool for Using Natural Language in Data Mining

Protoforms of Linguistic Database Summaries as a Human Consistent Tool for Using Natural Language in Data Mining

Janusz Kacprzyk (Polish Academy of Sciences, Poland) and Slawomir Zadrozny (Polish Academy of Sciences, Poland)
Copyright: © 2012 |Pages: 12
DOI: 10.4018/978-1-4666-0261-8.ch010
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

We consider linguistic database summaries in the sense of Yager (1982), in an implementable form proposed by Kacprzyk & Yager (2001) and Kacprzyk, Yager & Zadrozny (2000), exemplified by, for a personnel database, “most employees are young and well paid” (with some degree of truth) and their extensions as a very general tool for a human consistent summarization of large data sets. We advocate the use of the concept of a protoform (prototypical form), vividly advocated by Zadeh and shown by Kacprzyk & Zadrozny (2005) as a general form of a linguistic data summary. Then, we present an extension of our interactive approach to fuzzy linguistic summaries, based on fuzzy logic and fuzzy database queries with linguistic quantifiers. We show how fuzzy queries are related to linguistic summaries, and that one can introduce a hierarchy of protoforms, or abstract summaries in the sense of latest Zadeh’s (2002) ideas meant mainly for increasing deduction capabilities of search engines. We show an implementation for the summarization of Web server logs.
Chapter Preview
Top

Linguistic Summaries Using Fuzzy Logic With Linguistic Quantifiers

In Yager’s (1982) approach, we have:

  • V is a quality (attribute) of interest, e.g. salary in a database of workers,

  • Y = {y1, ..., yn} is a set of objects (records) that manifest quality V, e.g. the set of workers; hence V(yi) are values of quality V for object yi,

  • D = {V(y1), ..., V(yn)} is a set of data (the “database” on question)

    • A linguistic summary of a data set (data base) consists of:

  • a summarizer S (e.g. young),

  • a quantity in agreement Q (e.g. most),

  • truth T - e.g. 0.7,

  • a qualifier R (optionally), i.e. another linguistic term (e.g. well-earning), determining a fuzzy subset of Y.

Complete Chapter List

Search this Book:
Reset