Knowledge discovery refers to the process of extracting new, interesting, and useful knowledge from data and presenting it in an intelligible way to the user. Roughly, knowledge discovery can be considered a three-step process: preprocessing data; data mining, in which the actual exploratory work is done; and interpreting the results to the user. Here, I focus on the data-mining step, assuming that a suitable set of data has been chosen properly. The patterns that we search for in the data are plausible relationships, which agents may use to establish cognitive links for reasoning. Such plausible relationships can be expressed via association rules. Usually, the criteria to judge the relevance of such rules are either frequency based (Bayardo & Agrawal, 1999) or causality based (for Bayesian networks, see Spirtes, Glymour, & Scheines, 1993). Here, I will pursue a different approach that aims at extracting what can be regarded as structures of knowledge — relationships that may support the inductive reasoning of agents and whose relevance is founded on information theory. The method that I will sketch in this article takes numerical relationships found in data and interprets these relationships as structural ones, using mostly algebraic techniques to elaborate structural information.
This article presents an approach to discover association rules that are most relevant with respect to the maximum entropy methods. Because entropy is related to information, this approach can be considered as aiming to find the most informative rules in data. The basic idea is to exploit numerical relationships that are observed by comparing (relative) frequencies, or ratios of frequencies, and so forth, as manifestations of interactions of underlying conditional knowledge.
My approach differs from usual knowledge discovery and data-mining methods in various respects:
It explicitly takes the instrument of inductive inference into consideration.
It is based on statistical information but not on probabilities close to 1; actually, it mostly uses only structural information obtained from the data.
It is not based on observing conditional independencies (as for learning causal structures), but aims at learning relevant conditional dependencies in a nonheuristic way.
As a further novelty, it does not compute single, isolated rules, but yields a set of rules by taking into account highly complex interactions of rules.
Zero probabilities computed from data are interpreted as missing information, not as certain knowledge.