RE: [Corpora-List] Re: Chomsky

From: Ute Römer (ute.roemer@Uni-Koeln.DE)
Date: Thu Oct 14 2004 - 19:34:57 MET DST

  • Next message: Marco Baroni: "[Corpora-List] automated transcription of English words"

    I think what Belén refers to is Chomsky's criticism (in Aspects of the
    Theory of Syntax, 1965) of the 'defective' kind of (E-)language corpora may
    contain. I quote from a recent article by Jan Aarts (entitled "Does corpus
    linguistics exist? Some old and new issues", published in Anna-Brita
    Stenström's festschrift, 2002?; sorry, I don't have the exact reference at
    hand) which includes the Chomsky 1965 quote:

     

    "At the same time it must be said that there is a not inconsiderable number
    of utterances that one comes across in corpora but will look in vain for in
    descriptive grammars of language use. Among them are broken-off sentences,
    false starts, repetitions of phonemes, morphemes, words and (parts of)
    larger constituents, anacolutha, stretches of text from other languages or
    from sub-standard varieties, as well as utterances that the speaker or
    writer intended to be ungrammatical; in short, corpora contain among other
    things evidence of “such grammatically irrelevant conditions as memory
    limitations, distractions, shifts of attention and interest and errors ...”
    Chomsky 1965: 3)."

     

    Best wishes... Ute

     

    Just found the reference on the Rodopi website:

    From the COLT’s mouth ... and others’.
    Language Corpora Studies. In honour of Anna-Brita Stenström.
    BREIVIK, Leiv Egil and Angela HASSELGREN (Eds.)
    Amsterdam/New York, NY, 2002, X, 260 pp.

     

     

     

     

      _____

    From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
    Behalf Of Shlomo Izre'el
    Sent: Thursday, October 14, 2004 6:00 PM
    To: Corpora list
    Subject: [Corpora-List] Re: Chomsky

     

    I don't have the original by Leech, but here is what I have in my files:
    "Any natural corpus will be skewed. Some sentences won't occur because they
    are obvious, others because they are false, still others because they are
    impolite. The corpus, if natural, will be so wildly skewed that the
    description would be no more than a mere list."
    (Chomsky in Leech, The State of the Art in Corpus Linguistics, 1991, p. 8)
    Shlomo Izre'el

    On Oct 14, 2004, at 4:08 PM, Bob Knippen wrote:

    Mª Belén Díez Bedmar wrote:

    > I'm looking for the exact bibliographical reference where we can find
    > Chomsky's idea that a corpus presents a language that is defective or
    > corrupted.

    To my knowledge, he never says any such thing.

    He does say, in several places (Syntactic Structures, 1957 comes to
    mind), that corpora do not provide the kind of information about
    linguistic competence that Linguistics ought to be after.

    In particular, he says that corpora do not provide information about
    what is ungrammmatical, and he says something to the effect that
    corpora, being finite, do not shed light on the infinite generative
    capacity of language. (That is, a statistical model based on a
    particular corpus is not a model of the language in general).

    I very much doubt he wrote that a corpus presents a language that is
    defective or corrupted.

    Bob

    -- 
    Bob Knippen 
    Computer Science Department
    110 Volen Center
    Mail Stop 018
    Brandeis University 
    415 South Street 
    Waltham, MA 02254-9110 
    781-736-2745 
    http://www.cs.brandeis.edu/~knippen
    

    +++++++++++++++++++++++++++++++++++++++++++ This Mail Was Scanned By Mail-seCure System at the Tel-Aviv University CC.

    _______________________________________________________ Shlomo Izre'el Professor of Semitic Linguistics Department of Hebrew and Semitic Languages Webb Building #516 Tel Aviv University Home address: POB 39040 Simtat Neve-Tsedek 7 IL-61390 Tel Aviv IL-65154 Tel Aviv Israel Israel Tel. +972-3-640 5016 Tel. +972-3-517 5341 Fax. +972-3-640 7031 Fax. +972-3-510 1867 +972-3-640 9457 izreel@post.tau.ac.il http://www.tau.ac.il/humanities/semitic/izreel.html

    The Corpus of Spoken Israeli Hebrew: http://www.tau.ac.il/humanities/semitic/maamad.html (Hebrew text) http://www.tau.ac.il/humanities/semitic/cosih.html (English text)



    This archive was generated by hypermail 2b29 : Fri Oct 15 2004 - 11:01:53 MET DST