[Corpora-List] corpora & chomsky

From: Florian Jaeger (tiflo@csli.stanford.edu)
Date: Thu Oct 14 2004 - 16:54:20 MET DST

  • Next message: Diana Maynard: "Re: [Corpora-List] Chomsky"

    Hi,

    I agree with Bob. On the one hand, Chomsky (at least in his early work)
    sharply distinguishes between competence and performance (and any language
    data belons to the performance category, including corpus data). On the
    other hand, he does not say that corpus data is 'defective' or
    'corrupted'. As Bob said, corpora do not provide explicit negative
    evidence (although, statistically, if we get large enough balanced corpora
    the likelihood that the absence of a structure [rather than a specific
    string instance of that structure] actually means that this structure does
    not exist in the language increases, but arguably even current Gigaword
    corpora are still quite small).

    Schuetze (1996) wrote a master thesis about 'The empirical basis of
    linguistics'. It contains discussions of the competence - performance
    distinction as well as what kind of data is valid for which kind of
    arguments. He focuses mostly on acceptability judgments but, as I recall,
    the book contains quotes from Chomsky and discussion by Schuetze with
    regard to corpus work as well. Another book, that touches on similar
    issues (from a different angle) is Wasow (2002) "Post-verbal behavior".

    Hope that helps,

    Florian



    This archive was generated by hypermail 2b29 : Thu Oct 14 2004 - 16:48:40 MET DST