Re: Thesaurus

t-markl@microsoft.com
Tue, 1 Aug 95 12:31:03 PDT

|From: DAVID JOHN CONIAM <B096770@idea.csc.cuhk.hk>
|To: <corpora@hd.uib.no>
|Subject: Thesaurus
|Date: Tuesday, 1 August 1995 7:27AM
|
|Can anyone point me to ftp sites where there are downloadable
|English thesauruses? I have heard that Roget's is available.

Hi David,

You have at least a couple of options:

1) Use WordNet, a freely available lexical taxonomy consisting
of small synonym sets (about 4 words in each) linked by various
semantic relations (ISA, HAS_PART, etc). This was developed
by George Miller (1990) and associates. It contains around 167,000
word senses, including nouns, verbs, adjectives and adverbs.
ftp://clarity.princeton.edu/pub/wordnet/wn1.5unix.tar.gz.a

2) Use Roget's 1911 Thesaurus from Project Gutenburg consisting
of 1043 categories, each containing nouns, verbs, adjectives, adverbs
and phrases. There are an average of 34 single-word nouns in each.
This was entered by Patrick Cassidy of Micra Inc. The file is in human
readable form and so requires quite a bit of massaging to get a machine
tractable version. I have done this already for the nouns (see Lauer, 1995;
Resnik, 1995), and if anyone would like to use my version, please email me
and I will try to get back to you as soon as I can.
ftp://mrcnext.cso.uiuc.edu/etext/etext91/roget13a.txt

Hope this is of use to people out there.

Best wishes,
Mark Lauer
Microsoft Institute
Sydney, Australia

Miller, G. (1990) WordNet: An On-line Lexical Database.
In International Journal of Lexicography, Vol. 3(4).

Lauer, M. (1995) Corpus Statistics Meet
The Compound Noun: Some Empirical Results.
In Proceedings of the 33rd Annual Meeting
of the Association for Computational Linguistics,
Cambridge, MA.

Resnik, P. (1995) Disambiguating Noun Groupings
with Respect to WordNet Senses
In Proceedings of the Third Workshop on Very Large Corpora,
Cambridge, MA.