Linguistic Data Summarization: A High Scalability through the Use of Natural Language?

Linguistic Data Summarization: A High Scalability through the Use of Natural Language?

Janusz Kacprzyk (Polish Academy of Sciences, Poland) and Slawomir Zadrozny (Polish Academy of Sciences, Poland)
DOI: 10.4018/978-1-60566-858-1.ch008
OnDemand PDF Download:


The authors discuss aspects related to the scalability of data mining tools meant in a different way than whether a data mining tool retains its intended functionality as the problem size increases. They introduce a new concept of a cognitive (perceptual) scalability meant as whether as the problem size increases the method remains fully functional in the sense of being able to provide intuitively appealing and comprehensible results to the human user. The authors argue that the use of natural language in the linguistic data summaries provides a high cognitive (perceptional) scalability because natural language is the only fully natural means of human communication and provides a common language for individuals and groups of different backgrounds, skills, knowledge. They show that the use of Zadeh’s protoform as general representations of linguistic data summaries, proposed by Kacprzyk and Zadrozny (2002; 2005a; 2005b), amplify this advantage leading to an ultimate cognitive (perceptual) scalability.
Chapter Preview


The purpose of this paper is to present a novel, different argument for the usefulness and power of linguistic data(base) summarization the essence of which was proposed by Yager (1982), and an extended, implementable version was shown by Kacprzyk & Yager (2001) and Kacprzyk, Yager & Zadrożny (2000).

We consider our further developments of the basic solutions presented in those papers which are relevant for our discussion, notably:

  • a close relation between the linguistic data summarization and fuzzy database querying, to be more specific using fuzzy queries with linguistic quantifiers proposed by us (Kacprzyk & Ziółkowski, 1986) and in a much more extended form in (Kacprzyk, Zadrożny & Ziółkowski, 1989), and even more so in FQUERY for Access (Kacprzyk & Zadrożny, 2001b),

  • our general approach to linguistic data summarization viewed as an interactive process in which fuzzy querying makes possible the articulation of the user’s intentions, interests and information needs proposed by Kacprzyk & Zadrożny (1998; 2001a), and

  • our formulation of linguistic data summarization in terms not only of the calculus of linguistically quantified proposition but in terms of Zadeh’s protoforms (cf. (Kacprzyk & Zadrożny, 2002; 2005a; 2005b)) which can provide an extraordinary transparency, versatility and generality.

Our purpose in this paper will not be, however, a traditional exposition of the essence of those ideas which have been presented in our papers as referred to above, and which have proved to be very effective and efficient. We will discuss these tools and techniques from the perspective of this volume, that is, from the perspective of scalability of data mining (knowledge discovery) tools and techniques. In the case of linguistic data(base) summarization this will have a couple of aspects exemplified by both more technical computation time and memory related aspects of the scalability of databases and querying, and more conceptual aspects of what might be called a cognitive or perceptional scalability of tools from the point of view of human facilities and capabilities. Ultimately, we will argue that linguistic data summarization may be viewed from some points of view, notably with respect to the cognitive and perceptual scalability, as an ultimately scalable (in the cognitive or perceptual sense) tool for data mining and knowledge discovery.



The first question we should ask is: What is actually meant by scalability, in particular in the context of broadly perceived information technology? Usually, scalability is meant in two basic ways. First, it is understood as the ability of a computer application or system (i.e. hardware and/or software) to continue to function when the size of the problem in question (e.g. the size of a computer network, number of clients, size of data sets, etc.) changes, usually grows up. In our context of a broadly perceived data analysis, in this paper the scalability will be meant in the upward sense. Second, in a modern view, scalability is meant as the ability of a computer application and/or system not only to function as the size of the problem and/or context increases (or decreases but this case will not be considered) but to even take advantage of that increase in size and volume, for instance to provide more adequate results because of a larger basic data set, or an ability to more adequately grasp the very essence of a larger data set. Needless to say that scalability is a desirable property of any application or system, and virtually all nontrivial applications and systems are designed and implemented with scalability in mind.

Complete Chapter List

Search this Book: