Corpora: ignorance among the media

Jock McNaught (jock@ccl.umist.ac.uk)
Thu, 6 Nov 1997 18:34:02 +0000 (GMT)

The UK newspaper 'The Guardian' published an article on Tuesday Nov 4th
entitled 'The lucrative dictionary of life' by David Rowan and John Ezard
(page 15). This was essentially about new words appearing in the language
and focussed on a list of words issued by Collins Dictionaries that were
said to be representative of the time they appeared in the language.
(That is the surface interpretation: UK media watchers will note a subtext
to do with The Guardian indulging in some bashing of its rival, the Murdoch
press empire, but of course why should corpus linguistics be dragged into
the mud because of that.)

Although the discussion left a lot to be desired in terms of level of
knowledge about lexicography on the part of the writers, what drew my
attention was this sentence:

"Collins uses the Bank of English, based in Birmingham, as the so-called
corpus with which it analyses the language.".

Note the use of the highly insulting 'so-called' which implies that, in
the writers' opinion, 'corpus' is an incorrect designation. This is surely
an insult not only to our colleagues at Birmingham and Collins, but also
to every corpus linguist! If you want to complain about this startling
ignorance of the writers, then by all means do send an e-mail to their
address

analysis@guardian.co.uk

and let them know that not only is 'corpus' a term but that the notion of
corpus lies at the heart of modern lexicography, corpus linguistics and
related fields such as natural language processing.

JMcN

-- 

John McNaught jock@ccl.umist.ac.uk Centre for Computational Linguistics UMIST PO Box 88 Sackville Street Manchester, UK tel: +44.161.200.3098 (direct) M60 1QD fax: +44.161.200.3099