Re: Polysemy

Christopher D Manning (manning+@andrew.cmu.edu)
Thu, 18 Jan 1996 10:53:21 -0500 (EST)

Michael Barlow writes:
>I would appreciate any information/leads on what methods are being used
>to provide corpus-based confirmatory evidence for the
>linguist/lexicographer's postulation of different senses/uses of a word.

Algorithms for "unsupervised word sense disambiguation" (perhaps it
should really be called "word sense clustering") directly address this
problem since they attempt to determine for themselves corpus-motivated
clusterings of uses of a word that represent sense distinctions. The
main work in this area is:

Schuetze, H. Dimensions of Meaning. In Proceedings of Supercomputing
'92, 1992

Schuetze, H. and J. Pedersen. Information retrieval based on word
senses. 4th Symposium on Document Analysis and Information Retrieval,
1995.

In addition:

Yarowsky, D. Unsupervised Word Sense Disambiguation Rivaling Supervised
Methods, ACL 33, 1995.

advertises itself as an unsupervised method, but the need for "seeding"
requires it to be supplied with externally given senses (such as from a
dictionary) and so it doesn't really qualify as independently suggesting
sense clusters.

The other main possibility, as you mention, is to use other languages as
a guide to what sense distinctions should be recognized in the target
language. Work that has addressed this includes:

Brown, P., Della Pietra, S., Della Pietra, V., Mercer, R. Word sense
disambiguation using statistical methods. ACL 29, 1991.

Dagan, I and Itai, A. Word sense disambiguation using a second language
monolingual corpus. Computational Linguistics 20, 1994.

Gale, W., Church, K., and Yarowsky, D. A Method for Disambiguating Word
Senses in a Large Corpus. Computers and the Humanities 26, 1993.

This last one is also a good general intro to issues in word sense
disambiguation.

Chris.