Re: inquiry

Toshiyuki TAKEZAWA (takezawa@itl.atr.co.jp)
Tue, 14 Mar 1995 09:54:33 +0900

Dear Ms. Yamaguchi,

> I am studying at the University of Essex in England and intend to deal with
> morphological processing in Japanese as the topic of my phD dissertation. As a
> lead-in to this subject I need to get some frequency data of Japanese words
> (especially inflectional words). After having had contact with R. Harald Baaye
n in
> Mac Planck Institute for Psycholinguistics in Nijmegen in Holland, I received
your
> e-mail address. He wrote that you may probably be able to provide me with some
> convenient Japanese date from your corpora list, I would be very grateful if y
ou
> could give me more detailed information. Thank you very much in advance.

Here is a list of available Japanese text corpora.

(1) spoken Japanese

ATR corpus contains conversations between Japanese speakers through
telephone and/or keyboard communications. All conversations are
transcribed. Morphological and syntactical tags are given.
Corresponding English is given. About half million words are
available. The contact address of a distribution coordinator is as
follows.

ATR (Advanced Telecommunications Research Institute) International
Research Engineering Department
Mr. Shohei TAHARA
2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-02, Japan
Telephone: +81 774 95 1192
Facsimile: +81 774 95 1179
Email: sho@ctr.atr.co.jp

(2) written Japanese

EDR corpus is available. 28 million sentences are collected from
newspapers, magazines and so on. Morphological and syntactical tags
are given to about half million sentences.

EDR (Japan Electronic Dictionary Research Institute, Ltd.)
Telephone: +81 3 3798 5521
Facsimile: +81 3 3798 5335

I hope that this information is of help to you. Thank you.

--
//Toshiyuki TAKEZAWA <takezawa@itl.atr.co.jp>
  ATR Interpreting Telecommunications Research Laboratories
  Kyoto, Japan