Re: Corpora: Foundations

James L. Fidelholtz (jfidel@siu.buap.mx)
Sat, 24 Jul 1999 09:38:01 -0500 (CDT)

On Sat, 24 Jul 1999, Gordon and Pam Cain wrote:

[snip]

>1. It seems to me that if we accept anything and everything that may
>turn up on a trawl through a corpus, then we will end up including
>utterances made in error, made under false understandings of semantic
>content, made idiosyncratically, made in archaic form, or that are the
>result of typographical errors. Thus it rather strikes me that there is
>always the need for human, subjective judgement in the results of any
>corpus.
>
>2. As to the matter of language change, or differences of opinion as to
>what is grammatical (eg, our prep + who/whom discussion) this gets a bit
>more tricky. Certainly I say that these utterances occur with
>regularity. But does that make them equally acceptable English in every
>case with the 'more correct' forms? If I am teaching my students
>academic English, I will warn them off such utterances, as they will be
>view\ed as sub-standard and will hurt their mark most likely. Clearly
>what occurs is not always what is acceptable.

[snip]

>To return to the question: prep+ who/whom is a borderline case -- as are
>perhaps a great many cases -- but shouldn't proper corpus linguistics
>acknowledge the need for human judgement, to prevent blind empirical
>methods from returning unique results that the original speaker
>him/herself would reject in hindsight?

Well, I certainly wouldn't want to take researcher judgment out of the
equation, especially in obvious cases (many of which, such as eg
'we3ll', could probably be eliminated as weirdos automatically).
Nevertheless, it seems to me that one of the principle points of corpus
linguistics is precisely to detect those groused-at words and
constructions that have in fact become part of the language. Of course,
we then have to figure out, perhaps non-automatically, WHY and/or HOW
this has come about, but the figures (and of course we have to decide
also at what point to draw the line for general or specific
acceptability), we suppose, cannot be argued with, at least if the
corpus is in some sense representative.
Jim

James L. Fidelholtz e-mail: jfidel@siu.buap.mx
Maestría en Ciencias del Lenguaje
Instituto de Ciencias Sociales y Humanidades
Benemérita Universidad Autónoma de Puebla, MÉXICO