A source for MRD's

haines@ISI.edu
Thu, 15 Jun 1995 16:59:15 -0700

> what tools and resources are available for Japanese, either commercially or
> public domain?

The Juman part of speech tagger is available from the University of Kyoto,
I think.

Here at ISI we have Kenkyusha's EJ and JE, the Kojien, the New Crown JE,
New Century EJ, the Concise Loanword Dictionary, and a handful of others
broken down into database format. (I wrote Lisp programs to do this, so
you can buy the dictionaries on your own and then apply my programs to
them.)

There is actually a long list of MRD's available on CD, which one can get
by calling (in the US) 1-800-203-3001 or +44 71 916-8375 (London) and
requesting the Electronic Book Catalog. Languages include Japanese, German,
French, Italian, Spanish, Danish, Dutch, etc. Also included are parallel books
on CD, encyclopediae, etc. A veritable gold mine of stuff.

Unfortunately all the stuff is encoded in the Electronic Book format.
Fortunately, I have spent the last year and a half breaking the code
and writing programs to dump Electronic Books to disk.

> Also I have heard that there is a public domain taxonomy in
> Australia which has been used as the basis for various MT systems.

I know of Jim Breem's Edict, which is a hand-compiled, freely available
JE dictionary. We use it extensively here at ISI in the JapanGloss JE
MT system. I would be very interested in a taxonomy, too.

A good place to start looking for Japanese resources (Edict included)
is http://www.realtime.net/~adamrice/

--Matthew