Re: Dictionaries on CD-ROM

Ted Dunning (ted@crl.nmsu.edu)
Thu, 25 Jan 1996 10:32:14 -0700 (MST)

From: Jem Clear <jem@cobuild.collins.co.uk>

"haines@ISI.EDU" says:
> [buy a cdrom dictionary and suck the data into a database]

I **know** I'm biased since I work for a commercial dictionary
publisher, but this sounds very dodgy to me. I think that the "raw
data" on these CD-ROMS is very likely to be copyright material owned
by the publisher.

there really are several questions.

the first and foremost is what the terms in the license agreement are.
if it says that you agree to access the dictionary only via the
provided software, then the researcher who reformats the data is in a
pretty difficult spot. the owner of the copyright really does have
the legal right to specify how you can use their copyrighted
information.

but what does research fair use mean here? isn't the private
investigation of the properties of a dictionary fall into the fair use
exception?

if i buy a dictionary which does not have a "don't copy or reformat at
all" license, then why can't i manipulate that dictionary at will for
research purposes? i would prefer to work with a company like cobuild
which will actually grant me a license for using the dictionary for
research purposes, but what about some other companies that don't even
return calls on the subject?

I am regularly harangued by friends and acquaintances working in NLP
and corpus linguistics about the way publishers tend to try to charge
for the provision of wordlists and text material of all sorts, but it
is this cavalier attitude exemplified by the Haines view that really
aggravates the situation.

indeed.

on the other hand, suppose we have a scenario in which

a) researcher x takes apart a cdrom to produce a research lexicon
b) this research lexicon works *fabulously* well
c) company y decides to fund the commercialization of this lexicon
d) researcher x directs y back to the original publisher who now has a
product to sell that they never knew that they had.

doesn't this process help the publisher enormously?

i agree that there is the alternative scenario which might be called
the "webster" scenario in which

a) researcher x gets a tape from a publisher for research purposes
b) researcher x puts the data on the net and everybody and their
brother gets a copy.
c) nobody ever buys another digital copy from the publisher.

the real question is how we can make scenario 1 happen much more often
than scenario 2. i am sure that there are publishers who would object
to both possibilities, but these people really don't have their heads
on straight. in scenario 1, everybody wins. is there any way that
this can be made to happen all the time?