Dictionaries on CD-ROM

Jem Clear (jem@cobuild.collins.co.uk)
Thu, 25 Jan 1996 09:43:46 GMT

"haines@ISI.EDU" says:

> Answering the recent question about Japanese MRD's
> --------------------------------------------------
>
> The following dictionaries are all available on CD-ROM for between $10
> and $100 apiece. All give the phonetic (kana) writings for the words,
> and this is trivially converted to roman letters if you so desire.
>
> I have written C programs for reducing them to database
> form, and I am happy to share the programs with anyone.
>
... [list of Japanese dicts on CD omitted] ...
>
> Hundreds of dictionaries are now available on CD-ROM. Sucking the
> raw data off the CD-ROM is a trivial exercise requiring a 2.5 page
> C program that runs on any machine or operating system. I will be
> happy to supply it to anyone who wants it.

I **know** I'm biased since I work for a commercial dictionary
publisher, but this sounds very dodgy to me. I think that the "raw
data" on these CD-ROMS is very likely to be copyright material owned
by the publisher.

I am regularly harangued by friends and acquaintances working in NLP
and corpus linguistics about the way publishers tend to try to charge
for the provision of wordlists and text material of all sorts, but it
is this cavalier attitude exemplified by the Haines view that really
aggravates the situation.

It takes Cobuild hundreds of person years to make a largish
dictionary. We only get paid if the publisher can sell the fruits of
our labour. Haines (at ISI.EDU) presumably gets his salary cheque on
the nail even if he puts every line of code he ever wrote into the
public domain, AND he gets the benefits from filching whatever
lexicons he can lay his hands on in CD-ROM format.

It doesn't sound like fair play to me.