Re: Corpora: Corpus Linguistics User Needs

Geoffrey Sampson (geoffs@cogs.susx.ac.uk)
Wed, 29 Jul 1998 10:54:52 +0100

I'm afraid my response risks sounding a little arrogant, but this is a point
that has puzzled me for years. You are quite right to say that many
corpus linguists do not know how to write programs, and rely on software
produced by others which may not meet their needs. It has always seemed to
me that the answer to a corpus linguist who sees this as a problem is
"Learn to program, then". I have never understood why it has become socially
acceptable for even quite junior academics to say "I can't program, someone
else will have to do this for me", while they wouldn't dream of saying
"I don't know how big library catalogue systems work, someone else will
have to fetch my books".

(In case anyone thinks "It's all very well for him to write that way, he is
a computer specialist", perhaps I should mention that my first degree was
in Chinese, mainly classical Chinese language, literature, Chinese history,
etc., plus a little general linguistics. I decided to learn about computers
as a graduate student because it was clear that they were destined to
become useful tools in linguistics.)

I believe this situation is not just a social oddity but is having unfortunate
consequences for progress in corpus linguistics. There is now an attitude
abroad that anyone who produces corpus research resources has not finished his
job unless he also produces purpose-built software for extracting information
from the resource. Since any such software necessarily will anticipate only
some possible questions that users might want to ask, and will fail to
provide for answering other kinds of question, this channels research into
a limited range of "obvious" directions and discourages originality. My
policy with SUSANNE and subsequent resources that I have been responsible for
creating has been to give the files an extremely simple and well-documented
structure, so that it is as easy as possible for researchers to write
programs to extract whatever type of information they want. But I have
become used to people asking "Where is the software to go with SUSANNE?",
like American tourists in a lovely old hotel gazing upstairs
to the first floor and forlornly asking "Where's the lift?"

Geoffrey Sampson

School of Cognitive & Computing Sciences
University of Sussex
Falmer, Brighton BN1 9QH, GB

e-mail geoffs@cogs.susx.ac.uk
tel. +44 1273 678525
fax +44 1273 671320
Web site http://www.grs.u-net.com