Re: [Corpora-List] Chomsky

From: T. Florian Jaeger (tiflo@stanford.edu)
Date: Thu Oct 14 2004 - 16:49:27 MET DST

  • Next message: Mcenery, Tony: "RE: [Corpora-List] Chomsky"

    Hi,

    I agree with Bob. On the one hand, Chomsky (at least in his early work)
    sharply distinguishes between competence and performance (and any language
    data belons to the performance category, including corpus data). On the
    other hand, he does not say that corpus data is 'defective' or 'corrupted'.
    As Bob said, corpora do not provide explicit negative evidence (although,
    statistically, if we get large enough balanced corpora the likelihood that
    the absence of a structure [rather than a specific string instance of that
    structure] actually means that this structure does not exist in the
    language increases, but arguably even current Gigaword corpora are still
    quite small).

    Schuetze (1996) wrote a master thesis about 'The empirical basis of
    linguistics'. It contains discussions of the competence - performance
    distinction as well as what kind of data is valid for which kind of
    arguments. He focuses mostly on acceptability judgments but, as I recall,
    the book contains quotes from Chomsky and discussion by Schuetze with
    regard to corpus work as well. Another book, that touches on similar issues
    (from a different angle) is Wasow (2002) "Post-verbal behavior".

    Hope that helps,

    Florian

    At 10:08 AM 10/14/2004 -0400, Bob Knippen wrote:

    >Mª Belén Díez Bedmar wrote:
    >
    > > I'm looking for the exact bibliographical reference where we can find
    > > Chomsky's idea that a corpus presents a language that is defective or
    > > corrupted.
    >
    >To my knowledge, he never says any such thing.
    >
    >He does say, in several places (Syntactic Structures, 1957 comes to
    >mind), that corpora do not provide the kind of information about
    >linguistic competence that Linguistics ought to be after.
    >
    >In particular, he says that corpora do not provide information about
    >what is ungrammmatical, and he says something to the effect that
    >corpora, being finite, do not shed light on the infinite generative
    >capacity of language. (That is, a statistical model based on a
    >particular corpus is not a model of the language in general).
    >
    >I very much doubt he wrote that a corpus presents a language that is
    >defective or corrupted.
    >
    >Bob
    >
    >
    >--
    >Bob Knippen
    >Computer Science Department
    >110 Volen Center
    >Mail Stop 018
    >Brandeis University
    >415 South Street
    >Waltham, MA 02254-9110
    >781-736-2745
    >http://www.cs.brandeis.edu/~knippen
    >
    >

    T. Florian Jaeger From 09/2004 to 12/2004
    Ph.D. student Visiting Student
    Linguistics Department, Department of Linguistics & Philosophy,
    Stanford University, MIT,
    MJH, Bldg. 460, 77 Massachusetts Avenue, 32-D808,
    Stanford, CA 94305-2150, Cambridge, MA 02139,
    USA USA

    Phone: +1 (650) 725 2323 +1 (650) 799 2631
    Fax: +1 (650) 723 5666 +1 (617) 253 5017
    Email: tiflo@stanford.edu tiflo@mit.edu
    Url: http://www.stanford.edu/~tiflo/



    This archive was generated by hypermail 2b29 : Thu Oct 14 2004 - 21:44:05 MET DST