WORDNET: An On-line Lexical Database

 

 

Kadri Vider and Heili Orav

University of Tartu

Department of Estonian

kadriv@ut.ee horav@psych.ut.ee

 

 

Abstract

 

WN is an on-line lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory. English nouns, verbs, adjectives and adverbs are organised into synonym sets, each representing one underlying lexical concept. Different kinds of lexical relations link the synonym sets (synsets).

In 1990 WN contained approximately 54,000 different lexical entries organized into some 48,000 sets of synonyms

 

 

Psycholexicology

 

By focussing on historical (diachronic) evidence, the standard dictionaries neglected questions concerning the synchronic organization of lexical knowledge. A traditional dictionary lists lexical items alphabetically, giving definitions for each sense. WN, in contrast, is based on word meaning; all of the words that can express a given sense are grouped together in a SYNONYM SET, or SYNSET.

Beginning with word assotsiation studies at the turn of the century and continuing down to the sophisticated experimental tasks of the past twenty years, psycholinguists have discovered many synchronic properties of the mental lexicon that can be exploited in lexicography. For example, slips of the tongue, such as the substitution of "week" for "day" or "tomorrow" for "yesterday" show evidence for organisation around meaning. Gardner’s report of such typical aphasics’ substitutions as "chair" for "table" and "knee" for "elbow" illustrate an organization of words around a common superordinate (the words named by the aphasics and the corresponding target words are co-hyponyms in WN).

In 1985 a group of psychologists and linguists at Princeton University undertook to develop a lexical database along lines suggested by these investigations. WN is the result. Inasmuch as it instantiates hypotheses based on results of psycholinguistic research, WN can be said to be a dictionary based on psycholinguistic principles.

Unfortunately, most research of interest for psycholexicology has dealt with relatively small samples of English lexicon, often concentrating on nouns. One motive for developing WN was to expose such hypotheses to the full range of common vocabulary.

 

Lexical Matrix

 

Lexical semantics begins with a recognition that a word is a conventional assotsiation between a lexicalized concept (=meaning) and an utterance (=form) that plays a syntactic role. Word form will be used here to refer to the physical utterance or inscription and word meaning to refer to the lexicalized concept that form can be used to express. Then the starting point for lexical semantics can be said to be the mapping between forms and meanings.

Word forms are imagined to be listed as headings for the columns; word meanings as headings for the rows. An entry in a cell of the matrix implies that the form in that column can be used (in an appropriate context) to express the meaning in that row. Entry E1.1 implies that word form F1 can be used to express word meaning M!. If there are two entries in the same column, the word form is polysemous; if there are two entries in the same row, the two word forms are synonyms (relative to a context). So some forms have several different meanings and some meanings can be expressed by several different forms.

Polysemy and synonymy are problems that arise in the course of gaining access to information in the mental lexicon: a listener or reader who recognizes a form must cope with its polysemy; a speaker or writer who hopes to express a meaning must decide between synonyms. At present, WN distinguishes sharply between semantic relations and lexical relations; the emphasis is still on semantic relations between meanings, but lexical relations between words are also included.

In order to simulate a lexical matrix it is necessary to have some way to represent both forms and meanings in a computer. For word meanings definitions can play the same role in a simulation that meanings play in the mind of a language user.

A lexical matrix can be represented for theoretical purposes by a mapping between written words and synsets. Since English is rich in synonyms, synsets are often sufficient for differential purposes. The synonym sets {board, plank} and {board, committee} can serve as unambiguous designators of these two meanings of board. These synonym sets (synsets) do not explain what the concepts are; they merely signify that the concepts exist.. Sometimes an appropriate synonym is not available, in which case the polysemy can be resolved by a short gloss, for example:{board, (a person’s meals, provided regulary for money)}, can serve to differentiate this sense of board from the others.

 

 

Semantic relations

 

WN is organized by semantic relations. Since a semantic relation is a relation between meanings, and since meanings can be represented by synsets, it is natural to think of semantic relations as pointers between synsets. It is characteristic of semantic relations that they are reciprocated.

The database is restricted to the relations suggested by the psycholinguistic data: synonymy, antonymy, hyponymy, hypernymy, and three types of meronymy and holonymy. These and other similar relations serve to organize the mental lexicon. They can all be thougt of, or represented by, pointers or labeled arcs from one synset to another.

.

 

I Synonymy

 

It should be obvious that the most important semantic relation is similarity of meaning. According to one definition (usually attributed to Leibniz) two expressions are synonymous if the substitution of one for the other never changes the truth value of a sentence in which the substitution is made. By that definition, true synonyms are rare, if they exist at all. A weakwned version of this definition would make synonymy relative to a context: two expressions are synonymous in a context C if the substitution of one for the other in C does not change the truth value. But the important point is that theories of lexical semantics do not depend on truth-functional conceptions of synonymy; semantic similarity is sufficient. The relation is symmetrical. If x is similar to y, then y is equally similar to x.

 

 

II Hyponymy (variously called subordination/superordination, subset/superset)

 

Hyponymy is transitive and asymmetrical: An x is a (kind of) y. Since there is normally a single superordinat, this semantic relation generates a hierarchical semantic structure where a hyponym is said to be below its superordinate. The hyponym inherits all the features of the more generic concept and adds at least one feature that distinguishes it from its superordinate and from other hyponyms of that superordinate.

III Antonymy

The antonym of a word x is sometimes not-x, but not always. Antonymy, which seems such a simple symmetrical relation, is at least as complex as the other semantic relations. For example, rich and poor are antonyms, but to say that someone is not rich does not mean that they must be poor.

 

 

IV Meronymy/holonymy (part-whole relation)

 

A y has an x (as a part) or an x is a part of y. The meronymic relation is transitive and asymmetrical and can be used to construct a part hierarchy. The concept of a part of a whole can be a part of a concept of the whole.

 

The adjectives include a relation of semantic similarity whereas the verbs also have sentence frames and an "entailment" relation. Because familiarity has been shown to be psycholinguistically relevant for lexical access in, for example, lexical decision times, and polysemy data are relevant for familiarity, polysemy measures are also associated with the lexical entries.

For each part of speech, different relation play a major role. Nouns are organized in lexical memory as topical hierarchies, in the adjective lexicon, antonymy and similarity organize the various lexical items and verbs are organized by a variety of entailment relations.

 

 

VERB

 

There are far fewer verbs than nouns in the language and verbs are more polysemous than nouns. There are organized some 8,500 verb forms into about 5.000 synsets and divided into 14 semantically distinct groups: verbs of bodily care and functions, change, cognition, communication, competition, consumption, contact, creation, emotion, motion, perception, possession, social interaction and weather verbs. The major division into 14 semantically coherent groups reflects the division between the major conceptual categories EVENT and STATE.

Verbs cannot easily be arranged into the kind of tree structures. First of alla, the number of hierarchical levels does not exceed four. Second, not all verbs can be grouped under a single top node or unique beginner.

A verb does not refer in the same way a noun does. The "kind-of" relation corresponds to a "manner-of" relation among the verbs. Troponymy is the most frequently found relation among verbs: that is, most lexicalized verb concepts refer to an action or event that constitutes a manner elaboration of another activity or event.

Any acceptable statement about part-relation among verbs always involves the temporal relation between the activities that the two verbs denote. One activity or event is part of another activity or event only when it is part of, or a stage in, its temporal realization. The activities can be simultaneous (as with fatten and feed); one can include the other (as with snore and sleep); or one can precede the other (try and succeed). Three kinds of lexical entailments include the different relations illustrated by those pairs.

Causation is a kind of entailment and, like the other entailment relation, is asymmetrical. The troponyms of the verbs in a causative-resultative pair inherit the relation of their relative superordinate.

Much of the opposition among verbs is based on the morphological markedness of one member of an opposed pair. A variety of negative morphological markers (like un-, de-, dis-) attach to verbs to form their respective opposing members. Gradable verbs can be modified by degree adverbs, such as quite, rather or extremely. There is found quite systematical opposition in the verb lexicon between co-trponyms like rise/fall and walk/run. Many verb pairs are not only in an opposition relation, but also share an enatailed verb. Some pairs like teach/learn occur frequently in language within the same semantic field; they refer to the same activity, but from the viewpoint of different participants.

 

 

NOUN

 

When psychologists think about the organization of lexical memory it is nearly always the organization of nouns that they have in mind. Nominal concepts are organized hierarchically into levels, from specific to generic. The topmost, or most generic level of the hierarchy is almost vacuous semantically. If these hierarchies were inheritance systems, they seldom go more than ten levels deep, and those cases usually contain technical levels that are not part of the everyday vocabulary. These hierarchies of nominal concepts have been said to have a level, somewhere in the middle, where most of the distinguishing features are attached. It is referred to as the base level of the noun lexicon, and concepts at this level are basic concepts. For lexical concepts at the base level, people can list many distinguishing features. Above the base level, descriptions are brief and general. Below the base level, little is added to the features that distinguish basic concepts.

 

WordNet includes 24,825 noun synsets and 32,264 different nouns with a total of 43,136 senses. In addition to lexical noun synonyms,noun synsets contain pointers to synsets representing lexical concepts that are hypernyms, hyponyms, meronyms, holonyms and antonyms.

Hyponymy / hypernymy is the most common relation among the noun synsets (over 25,000 hypernyms are coded).

Antonymy is rather poorly represented among the nouns, with the regular exception of nouns referring to attributes.

Meronymy/holonymy, the part-whole relation, is the other major organizer of the noun lexicon. There is three kinds of part: components (e.g. a leg is a part of body), members (e.g. a relative is a part of family), and stuff (flesh is part of leg).

 

 

ADJECTIVES

 

Linguists have provided several criteria for distinguishing between predicative and non-predicative adjectives:

  1. Predicative and non-predicative adjectives cannot be conjoined;
  2. Non-predicative adjectives are not gradable;
  3. Non-predicative adjectives cannot be nominalized.

Numbers are a special class of non-predicative adjectives: they do not conjoin , they are not gradable and they cannot be nominalized.

 

The adjective lexicon is structured around a conceptual opposition expressed by a pair of antonymous words. An adjective that has no direct antonym is similar in meaning to some adjective that does have a direct antonym. Morphological origin of many antonym pairs is that antonymy is not a semantic relation between word meaning , but rather is a semantic relation between word forms.

WordNet database includes 10,653 adjective synsets containing 12, 909 different adjectives organized into 1,006 adjective clusters.

 

 

References

 

Beckwith, R., Fellbaum, C., Gross, D., Miller, G. 1990. ‘WordNet: A Lexical Database Organized on Psycholinguistic Principles.’ In Zernik, U. (Ed.), Using On-line Resources to Build a Lexicon. Chapter 9, 211-231, Hillsdale, NJ: Erlbaum.

Miller, G., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J. 1990. Introduction to WordNet: An On-line Lexical database. International Journal of Lexicography, 3: 235-312.

Miller, G., Fellbaum, C. 1991, Semantic networks of English. Cognition, 41: 197-229