Re: Corpora: Typical number of senses

Ted E. Dunning (ted@aptex.com)
Fri, 7 Nov 1997 10:56:39 -0800

i am sorry not to be able to contribute quite as constructively as i
would like, but it should be pointed out that the this question isn't
quite well posed:

>>>>> "yy" == Yaari Yaakov <yyaari@macs.biu.ac.il> writes:

yy> Can anyone knows the average number (or typical, or expected
yy> value) of senses per noun phrase in Wordnet?

there are a couple of interpretations possible here which give
radically different results.

for example, if the question were to mean how many senses there are
for noun phrases which are actually entries in wordnet, the answer is
very nearly one. this is because phrases are used in wordnet to
tightly express the meaning of an individual sense (for example "paper
tiger" vs. "bengal tiger").

on the other hand, the question might refer to how ambiguous a generic
phrase or sentence found in a corpus might be given wordnet as a
reference. here, the answer would be exponential (or more) in the
number of words in the phrase or sentence. my guess is that the base
of the exponent would be >2 and possibly >4. this sort of measure is
closely related to perplexity measures for speech systems and
perplexity would be a good measure of the improvement in performance
of a real disambiguator.

if you take this last question as the one of interest, you would have
to look seriously at wilks et al. recent work on disambiguation using
only POS tags. their claim is that most of the word sense ambiguity
goes away if you know part of speech tags.