I think what Belén refers to is Chomsky's criticism (in Aspects of the
Theory of Syntax, 1965) of the 'defective' kind of (E-)language corpora may
contain. I quote from a recent article by Jan Aarts (entitled "Does corpus
linguistics exist? Some old and new issues", published in Anna-Brita
Stenström's festschrift, 2002?; sorry, I don't have the exact reference at
hand) which includes the Chomsky 1965 quote:
"At the same time it must be said that there is a not inconsiderable number
of utterances that one comes across in corpora but will look in vain for in
descriptive grammars of language use. Among them are broken-off sentences,
false starts, repetitions of phonemes, morphemes, words and (parts of)
larger constituents, anacolutha, stretches of text from other languages or
from sub-standard varieties, as well as utterances that the speaker or
writer intended to be ungrammatical; in short, corpora contain among other
things evidence of “such grammatically irrelevant conditions as memory
limitations, distractions, shifts of attention and interest and errors ...”
Chomsky 1965: 3)."
Best wishes... Ute
Just found the reference on the Rodopi website:
From the COLT’s mouth ... and others’.
Language Corpora Studies. In honour of Anna-Brita Stenström.
BREIVIK, Leiv Egil and Angela HASSELGREN (Eds.)
Amsterdam/New York, NY, 2002, X, 260 pp.
_____
From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
Behalf Of Shlomo Izre'el
Sent: Thursday, October 14, 2004 6:00 PM
To: Corpora list
Subject: [Corpora-List] Re: Chomsky
I don't have the original by Leech, but here is what I have in my files:
"Any natural corpus will be skewed. Some sentences won't occur because they
are obvious, others because they are false, still others because they are
impolite. The corpus, if natural, will be so wildly skewed that the
description would be no more than a mere list."
(Chomsky in Leech, The State of the Art in Corpus Linguistics, 1991, p. 8)
Shlomo Izre'el
On Oct 14, 2004, at 4:08 PM, Bob Knippen wrote:
Mª Belén Díez Bedmar wrote:
> I'm looking for the exact bibliographical reference where we can find
> Chomsky's idea that a corpus presents a language that is defective or
> corrupted.
To my knowledge, he never says any such thing.
He does say, in several places (Syntactic Structures, 1957 comes to
mind), that corpora do not provide the kind of information about
linguistic competence that Linguistics ought to be after.
In particular, he says that corpora do not provide information about
what is ungrammmatical, and he says something to the effect that
corpora, being finite, do not shed light on the infinite generative
capacity of language. (That is, a statistical model based on a
particular corpus is not a model of the language in general).
I very much doubt he wrote that a corpus presents a language that is
defective or corrupted.
Bob
-- Bob Knippen Computer Science Department 110 Volen Center Mail Stop 018 Brandeis University 415 South Street Waltham, MA 02254-9110 781-736-2745 http://www.cs.brandeis.edu/~knippen+++++++++++++++++++++++++++++++++++++++++++ This Mail Was Scanned By Mail-seCure System at the Tel-Aviv University CC.
_______________________________________________________ Shlomo Izre'el Professor of Semitic Linguistics Department of Hebrew and Semitic Languages Webb Building #516 Tel Aviv University Home address: POB 39040 Simtat Neve-Tsedek 7 IL-61390 Tel Aviv IL-65154 Tel Aviv Israel Israel Tel. +972-3-640 5016 Tel. +972-3-517 5341 Fax. +972-3-640 7031 Fax. +972-3-510 1867 +972-3-640 9457 izreel@post.tau.ac.il http://www.tau.ac.il/humanities/semitic/izreel.html
The Corpus of Spoken Israeli Hebrew: http://www.tau.ac.il/humanities/semitic/maamad.html (Hebrew text) http://www.tau.ac.il/humanities/semitic/cosih.html (English text)
This archive was generated by hypermail 2b29 : Fri Oct 15 2004 - 11:01:23 MET DST