Re: Corpora: Foundations

Gordon and Pam Cain (gpcain@rivernet.com.au)
Sun, 25 Jul 1999 22:16:00 +1000

"James L. Fidelholtz" wrote:

> Well, I certainly wouldn't want to take researcher judgment out of the
> equation, especially in obvious cases (many of which, such as eg
> 'we3ll', could probably be eliminated as weirdos automatically).
> Nevertheless, it seems to me that one of the principle points of corpus
> linguistics is precisely to detect those groused-at words and
> constructions that have in fact become part of the language.

I agree, and these are not my concern.

> Of course,
> we then have to figure out, perhaps non-automatically, WHY and/or HOW
> this has come about,

Good, fair and interesting.

> but the figures (and of course we have to decide
> also at what point to draw the line for general or specific
> acceptability), we suppose, cannot be argued with, at least if the
> corpus is in some sense representative.

But what if I compose a newspaper corpus (just for example). I can
seldom pick up and read long in a newspaper without encountering some
infelicitous example of writing that nearly any other journalist would
reject as 'bad'. And sometimes I encounter words just plainly misused
(not in the pedantic sense of misused, but rather just used in a sense
that no one else uses it for -- usually through confusing it for another
word). In neither a grammar nor a lexicon should these be included as
representative of typical journalistic grammar, or of typical word use.

Surely not everything shoud be accepted as representative that a corpus
turns up. (I realise this threatens to throw our objectivity and very
necessary impartiality into jeopardy.) Is the answer simply to require a
certain number of instances in my corpus to confirm an unexpected or
surprising usage?

Take care,
Gordon

-- 
Gordon Cain, Teacher of ESOL
TAFE International Education Centre, Liverpool
Sydney, Australia
gpcain@rivernet.com.au