Re: Corpora: Discovering new senses

Paul Buitelaar (paulb@dfki.de)
Wed, 24 Nov 1999 14:48:32 +0100

Ken Litkowski wrote:

> > Note that I am *not* interested on classifying NEW words in the sense
> > of, for instance Dekang Lin's work in :
> >
>
> But, Dekang's work is crucial to identifying new senses, primarily by
> comparing his results against existing senses and then by examining the
> subcat patterns and lexical preferences for ones that don't fit what is
> indicated in current MRDs.

I agree. As I wrote Dimitris already, I did some work in this direction for
my thesis, which by itself was mainly on extracting systematic polysemous
classes from WordNet, which resulted in the CoreLex database.

Then what I did was to collect statistical models on CoreLex classes (from
training corpora: Brown and Wall Street Journal in this case) and classify
words as they occur in other corpora according to these models. Words will
then fall into classes ('senses') that they originally belong to (acccording
to CoreLex, or whatever classification you use), but
also sometimes in new classes (again 'senses') because their usage in this
particular corpus is rather different from that of the training corpus. This
may be either a sense that was not represented in the training corpus, or a
genuine new sense that this word did not have in the CoreLex database.

Paul Buitelaar
Senior Researcher
DFKI-Language Technology
Saarbruecken, Germany

http://www.dfki.de/~paulb