As we have discussed in previous chapters, an artificial neural network is an information-processing system that maps a descriptive feature vector into a class assignment vector. In so doing, a neural network is nothing more than a complex and intrinsically nonlinear statistical classifier. It extracts the statistical central tendency of a series of exemplars (the learning set) and thus comes to encode information not just about the specific exemplars, but about the stereotypical featureset displayed in the training data (Churchland, 1989; Clark, 1989, 1993; Franklin, 1995). That means, it will discover which sets of features are most commonly present in the exemplars, or commonly occurring groupings of features. In this way, semantic features statistically frequent in a set of learning exemplars come to be both highly marked and mutually associated. “Highly marked” means that the connection weights about such common features tend to be quite strong. “Mutually associated” means that co-occurring features are encoded in such a way that the activation of one of them will promote the activation of the other.
TopFrom Observable Effects To Unobservable Causes
As we have discussed in previous chapters, an artificial neural network is an information-processing system that maps a descriptive feature vector into a class assignment vector. In so doing, a neural network is nothing more than a complex and intrinsically nonlinear statistical classifier. It extracts the statistical central tendency of a series of exemplars (the learning set) and thus comes to encode information not just about the specific exemplars, but about the stereotypical feature-set displayed in the training data (Churchland, 1989; Clark, 1989, 1993; Franklin, 1995). That means, it will discover which sets of features are most commonly present in the exemplars, or commonly occurring groupings of features. In this way, semantic features statistically frequent in a set of learning exemplars come to be both highly marked and mutually associated. “Highly marked” means that the connection weights about such common features tend to be quite strong. “Mutually associated” means that co-occurring features are encoded in such a way that the activation of one of them will promote the activation of the other.
As a learning mechanism, a neural network looks as if it explicitly generates and stores prototypes of, for example, the typical stone knife of this period, the typical burial practice in this community, the typical social organization in this period and place. However, there are no such explicit, stored items. What exist are sets of connection weights and synaptic efficacies, respectively. The prototype is not a thing stored at some specific place within the network; it is not an ideal representation of reality waiting to be retrieved by a stimulus. The extraction of the prototype arises as an emergent consequence of the proper selection of some characteristic features or input variables.
A prototype as formed within a neural network is by definition “general,” in the same sense in which a property is general: it has many instances, and it can represent a wide range of diverse examples. However, this property does not mean that prototypes are universal generalizations. No prototype feature needs to be universal, or even nearly universal, to all examples in the class. Furthermore, prototypes allow us a welcome degree of looseness precluded by the strict logic of universal quantifier: not all Fs need to be Gs, but the standard or normal ones are, and the non-standard ones must be related by a relevant similarity relationship to these that properly are G.
Different neurons represent different “prototypical values” along the continuum, and respond with graded signals reflecting how close the current exemplar is to their preferred value. Note that what is really being stored is the degree to which one neuron, representing a micro-feature of the final concept or prototype, predicts another neuron or micro-feature. Thus, whenever a certain configuration of micro-features is present a certain other set of micro-features is also present (Rumelhart, 1989). This is important, because it means that the system does not fall into the trap of needing to decide which category to put a pattern in before knowing which prototype to average. The acquisition of the different prototypes proceeds without any sort of explicit categorization. If the patterns are sufficiently dissimilar, there is no interference among them at all.
It is clear that a single prototype represents a wide range of quite different possible inputs: it represents the extended family of relevant features that collectively unite the relevant class of stimuli into a single category. Any member of that diverse class of stimuli will activate the entire prototype. In addition, any other input stimulus that is similar to the members of that class, in part or completely, will activate a pattern that is fairly close to the prototype. Consequently, a prototype vector activated by any given visual stimulus will exemplify the accumulated interactions with all the possible sources of the same or similar stimuli in proportion to the frequency with which they have been experienced.