Info on MRD's (answers! not a question!)

haines@ISI.EDU
Wed, 24 Jan 1996 09:21:26 -0800

Answering the recent question about Japanese MRD's
--------------------------------------------------

The following dictionaries are all available on CD-ROM for between $10
and $100 apiece. All give the phonetic (kana) writings for the words,
and this is trivially converted to roman letters if you so desire.

I have written C programs for reducing them to database
form, and I am happy to share the programs with anyone.

* Kojien (Japanese equivalent of the Oxford English Dictionary,
>150,000 headwords)
* Kodansha Dai Nihongo Jiten (180,000 headwords)
* Kenkyusha's Middle Japanese-English Dictionary (35,000 headwords)
* Crown New Century JE Dictionary, (40,000 headwords + 10,000 loan words)

> Hi, we'd be very glad to hear of a machine-readable list of
> alphabetically or phonemically-coded Japanese words if anyone
> knows of one. We want to search for particular phonological patterns
> in a large set of words.
>
> Peter Roach and Shuri Kumagai
>
> (please reply to s.kumagai@reading.ac.uk)

Answering the more general question about MRD's
-----------------------------------------------

Hundreds of dictionaries are now available on CD-ROM. Sucking the
raw data off the CD-ROM is a trivial exercise requiring a 2.5 page
C program that runs on any machine or operating system. I will be
happy to supply it to anyone who wants it.

Languages include:

English, Japanese, German, French, Spanish, Catalan, and many others

Once the raw data is sucked off the CD-ROM, the next step is breaking
it down into a database. This is a bit harder. Over the last two
years I have done this for more than ten dictionaries, and in the
process I have developed a set of tools for doing this. Typical time
for breaking down a dictionary is about a week. The general tools
should be available in C in a month or two.

--Matthew Haines