Corpora: Release of CoreLex: ONTOLOGY, LEXICAL SEMANTIC DATABASE and

Paul Buitelaar (paulb@cs.brandeis.edu)
Tue, 20 Jan 1998 18:36:07 -0500

Announcing the release of CoreLex

An ONTOLOGY, LEXICAL SEMANTIC DATABASE and TAGSET for nouns,
organized around SYSTEMATIC POLYSEMY and UNDERSPECIFICATION.

CoreLex developed out of a thesis on systematic polysemy and underspecification of
nouns, establishing an ontology and semantic database of 126 semantic types,
covering around 40,000 nouns and defining a large number of systematic polysemous
classes that are derived by a careful analysis of sense distributions in WordNet.
The semantic types are underspecified representations based on Generative Lexicon
theory and are used in an underspecified approach to semantic tagging, addressing
two problems: sense enumeration (the difficulty of deciding the number of discrete
senses), due to systematic polysemy; and multiple reference (NP's denoting more
than one model-theoretic referent), due to underspecification. Semantic tags that
are based on traditional, discrete senses tend to be too fine-grained for
practical use. For instance, WordNet has, on the lowest level, around 60,000
different tags (synsets) for nouns alone. The CoreLex approach, on the other hand,
offers a concise set of 126 tags that are inherently more coarse-grained, by
taking into account systematic polysemy and underspecification.

The CoreLex database is freely available for research purposes, including
commercial ones. For more information on the database and on the thesis that
describes its motivation, construction and use, see the CoreLex webpage:

http://www.cs.brandeis.edu/~paulb/CoreLex/corelex.html