RE: [Corpora-List] Chomsky

From: Mcenery, Tony (a.mcenery@lancaster.ac.uk)
Date: Thu Oct 14 2004 - 17:06:48 MET DST

  • Next message: Mike Maxwell: "Re: [Corpora-List] corpora & chomsky"

    Dear All,
     
    Chomsky has rarely discussed corpora and has certainly never precisely said something like 'corpora contain degenerate language so I do not like them!'. However, he has certainly said things about spontaneous speech which would lead one to conclude that he would view corpora constructed of such material as representing degenerate language. For example, he claims that children are exposed to a sample of language that is "a highly degenerate sample, in the sense that much of it must be excluded as irrelevant and incorrect - thus the child learns rules of grammar that identify much of what he has heard as ill-formed, inaccurate, and inappropriate" (Chomsky, 1972, Language and Mind, Harcourt Brace, pp 170-171). Others have certainly read this as saying that spoken language (and by extension a spoken language corpus) is degenerate. For example Pateman (1982) summarises the view as "the child emerges with a grammar (or grammars) with infinite generative power after exposure to a finite and, Chomsky would say, small and often degenerate corpus of speech which is addressed to it". Hope this helps. Best,
     
    Tony
     
    P.S. Pateman's work is on the web, see http://www.selectedworks.co.uk/chomskypapert.html

    ________________________________

    From: owner-corpora@lists.uib.no on behalf of Bob Knippen
    Sent: Thu 14/10/2004 15:08
    To: corpora
    Subject: Re: [Corpora-List] Chomsky

    Mª Belén Díez Bedmar wrote:

    > I'm looking for the exact bibliographical reference where we can find
    > Chomsky's idea that a corpus presents a language that is defective or
    > corrupted.

    To my knowledge, he never says any such thing.

    He does say, in several places (Syntactic Structures, 1957 comes to
    mind), that corpora do not provide the kind of information about
    linguistic competence that Linguistics ought to be after.

    In particular, he says that corpora do not provide information about
    what is ungrammmatical, and he says something to the effect that
    corpora, being finite, do not shed light on the infinite generative
    capacity of language. (That is, a statistical model based on a
    particular corpus is not a model of the language in general).

    I very much doubt he wrote that a corpus presents a language that is
    defective or corrupted.

    Bob

    --
    Bob Knippen                            
    Computer Science Department
    110 Volen Center
    Mail Stop 018
    Brandeis University    
    415 South Street                       
    Waltham, MA 02254-9110                 
    781-736-2745                                   
    http://www.cs.brandeis.edu/~knippen
    



    This archive was generated by hypermail 2b29 : Thu Oct 14 2004 - 21:44:08 MET DST