Attempting to Model Sense Division for Word Sense Disambiguation

Attempting to Model Sense Division for Word Sense Disambiguation

Pascual Cantos Gómez
DOI: 10.4018/978-1-60566-650-1.ch007
(Individual Chapters)
No Current Special Offers


This chapter starts exploring the potential of co-occurrence data for word sense disambiguation. The findings on the robustness of the different distribution of co-occurrence data on the assumption that distinct meanings of the same word attract different co-occurrence data, has taken the author to experiment (i) on possible grouping of word meanings by means of cluster analysis and (ii) on word sense disambiguation using discriminant function analysis. In addition, two priorities have been pursued: first, find robust statistical techniques, and second, minimize computational costs. Future research aims at the transition from coarse-grained senses to finer-grained ones by means of reiteration of the same model on different levels of contextual differentiation.
Chapter Preview

Background And State-Of-The-Art

Word sense disambiguation (WSD) is the problem of determining which sense or meaning of a word is activated by the use of the word in a particular context, a process which appears to be largely unconscious in people. WSD is a natural classification problem: Given a word and its possible meanings, as defined by a dictionary, classify an occurrence of the word in context into one or more of its sense classes. The features of the neighbouring words (Bar-Hillel 1960) provide the evidence for classification.

The 121 most frequent English nouns, which account for about one in five word occurrences in real text, have on average 7.8 meanings each (Miller 1990; Ng and Lee (1996)). But the potential for ambiguous readings tends to go completely unnoticed in normal text and flowing conversation. The effect is so strong that some people will even miss a real ambiguity obvious to others. Words may be polysemous in principle, but in actual text there is very little real ambiguity to a person.

WSD has relationships to other fields such as lexical semantics, whose main endeavour is to “understand” the relationships between “word”, “meaning”, and “context”. But even though word meaning is at the heart of the problem, WSD has never really been a key issue in lexical semantics. It could be that lexical semantics has always been more concerned with representational issues (Lyons 1995) and models of word meaning and polysemy so far too complex for WSD (Cruse 1986; Ravin and Leacock 2000). And so, the obvious procedural or computational nature of WSD paired with its early invocation in the context of machine translation (Weaver 1949/1955) has allied it more closely with language technology and thus computational linguistics. In fact, WSD has more in common with modern lexicography, with its intuitive premise that word uses group into coherent semantic units and its empirical corpus-based approaches, than with lexical semantics (Wilks et al. 1996).

Key Terms in this Chapter

Span: The span or window-size refers to the amount of co-textual data—left and right to the node word—we are considering for our investigation. It is normal use to take a span of -5 +5, that is up to 5 words left to the KWIC and 5 words right to it.

Collocation: Within the area of corpus linguistics, collocation is defined as a sequence of words or terms which co-occur more often than would be expected by chance within the context of a specific word (node/KWIC). Collocation refers to the restrictions on how words can be used together, for example which prepositions are used with particular verbs, or which verbs and nouns are used together. Collocations are examples of lexical units. Collocations should not be confused with idioms. For example, let us consider the same sentence above: ?Buying her first computer, Heron taught herself to trade online. ?A statistical analysis reveals that among the co-occurrences only one of them occurs within the context of computer more often than would be expected by chance, namely online. So we conclude that online is a collocation of computer.

Lexical Ambiguity: Lexical ambiguity arises when context is insufficient to determine the sense of a single word that has more than one meaning. For example, the word “bank” has several distinct definitions, including “financial institution” and “edge of a river,” but if someone says “I deposited €500 in the bank,” most people would not think you used a shovel to dig in the mud.

Cluster Analysis: Cluster analysis encompasses a number of different algorithms and methods for grouping objects of similar kind into respective categories. A general question facing researchers in many areas of inquiry is how to organize observed data into meaningful structures, that is, to develop taxonomies. In other words cluster analysis is an exploratory data analysis tool which aims at sorting different objects into groups in a way that the degree of association between two objects is maximal if they belong to the same group and minimal otherwise.

Concordance: A concordance is a special type of visual display of a sentence or list of sentences which are alingned according to the word under investigation with their immediate contexts.

Co-Occurrence: Co-occurrence is defined as a word/term or sequence of words/terms which simply co-occur with another word or term. For example, consider the following sentence: ?Buying her first computer, Heron taught herself to trade online. ?If we take the word computer, all other words —buying, her, first, Heron, taught, herself, to, trade, online—are seen as co-occurrences of computer.

Gravity: Gravity represents the extent of influence of the node word (KWIC) on its immediate environment. In other words, gravity shows which co-occurrences/collocations do most contribute to the meaning of the node word. The calculation of gravity is determined by the relative position/distance of the co-occurrences/collocations with respect to the node word and their frequency.

Word Sense Disambiguation: In computational linguistics, word sense disambiguation is the process of identifying which sense of a word—having a number of distinct senses—is used in a given sentence. For example, consider the word corn, three distinct senses of which are: ?1. Seed of any various grain, chiefly wheat, oats, rye and maize; such plants while growing ?2. Music, verse, drama, etc that is banal, sentimental or hackneyed. ?3. Small, often painful, area of hardened skin in the foot, esp on the toe ?and the sentence: ?This romantic ballad is pure corn. ?To a human it is obvious that this sentence is using the word corn in the second sense above. Although this seems obvious to a human, developing algorithms to replicate this human ability is a difficult task.

Discriminant Function Analysis: Discriminant function analysis is used to determine which variables discriminate between two or more naturally occurring groups. For example, a medical researcher may record different variables relating to patients’ backgrounds in order to learn which variables best predict whether a patient is likely to recover completely (group 1), partially (group 2), or not at all (group 3). Discriminant function analysis could then be used to determine which variable(s) are the best predictors of patients’ subsequent recovery.

Complete Chapter List

Search this Book: