RE: Corpora: Chomsky and corpus linguistics

From: Mcenery, Tony (eiaamme@exchange.lancs.ac.uk)
Date: Sun Apr 08 2001 - 17:34:49 MET DST

  • Next message: Samantha Willing: "Corpora: Microsoft Linguistic Opportunities!"

    Dear All,

    I have followed this thread with interest, as I am sure many have. Speaking as
    what Christopher Bader has identified as a Philistine, let me try my best to
    have a little snip at Samson's locks. In doing so I am putting my head above
    the parapet - feel free to shoot.

    > 1. It is simply wrong to contend that Chomsky has contributed
    > nothing to language technology. His work in the 1950's and '60's
    > laid part of the foundation for formal language theory. See any
    > textbook on automata and theory of computation, on the Chomsky
    > Hierarchy or Chomsky Normal Form.
    >
            [Mcenery, Tony]

            While in terms of early formal AI approaches to modelling language
    generative theory was seen as a promising field, I would contend that the
    promise was never fulfilled. The usable language technology that I know of now
    owes the greatest debt to corpus based approaches to the study of language.
    Christopher argues that early on Chomsky made a great - though incidental -
    contribution to language technology work by laying the foundations of formal
    language theories. However, this rather reminds me of those people who point
    proudly at Velcro fasteners or Teflon coated pans and say 'that was developed
    for the Apollo space programme!' when defending sending people to the moon,
    i.e. the trip may not have been worth the price tag in itself, but look at the
    side benefits! Surely there were cheaper ways to develop modern conveniences
    than to spend millions of dollars sending a few Americans to the moon.
    Similarly, developing a very dominant school of linguistics seems to have been
    a rather heavy handed way to lay the foundation of formal language theory.

    > 2. In his more recent work, Chomsky distinguishes between
    > the E-language (e.g. the set of all grammatical sentences)
    > and the I-language (the human language faculty). Generative
    > grammarians study the latter; corpus linguists, the former.
    > The Chomsky Hierarchy and Chomsky Normal Form are
    > of course concepts pertaining to the E-language, not to
    > the I-language, which is why Chomsky no longer works
    > in this area.
    >
            [Mcenery, Tony]
            I see no problem with the above statement, other than to say that at
    times Linguistics has excluded the study of E-language (in the sense of
    attested language use as opposed to the concoction of invented examples) as
    being part of linguistics proper. The would be Samsons on this list have said
    that corpus linguists simply misunderstand this or that view taken by
    Chomsky/generativists. What they don't understand is that most corpus linguists
    (I guess) on the list feel entirely misunderstood by linguists working in the
    Chomskyan paradigm. Take a recent quote from Smith (Smith, N. Chomsky, Ideas
    and Ideals, CUP, 2000:33) discussing concocted examples: "Appealing to examples
    as complex as these often strikes non-linguists as bordering on obscuritanism:
    a frequent objection is 'no one would actually say that' or 'no corpus of real
    utterances would contain such examples. ' This reflects an unmotivated
    preoccupation with facts about performance". The line taken by Smith - and he
    claims to be reflecting the views of Chomsky - is disconcerting for a corpus
    linguist. Smith continues to argue that scientists work on idealised examples
    and that people using 'common sense' misunderstand the true goals of science.
    In characterising linguistics in this way, Smith arguably casts corpus
    linguists as non-linguists and non-scientific. Corpus linguists - not simply
    'non-linguists' - would and have raised the objections Smith outlines. Corpus
    linguists do not have an unmotivated preoccupation with facts about performance
    - their preoccupations are often highly motivated though not, perhaps, in a
    theoretical framework that Chomsky or his followers would approve of. While I
    appreciate I am not quoting directly from Chomsky here, I think it is quite
    relevant to point out how in the presentation of his ideas the work and worth
    of corpus linguistics is often grossly misrepresented by those linguists who
    work in the tradition Chomsky has established.

    > Since generative linguists and computational linguists
    > have fundamentally different objects of study, it is not
    > surprising that they sometimes have trouble understanding
    > each other's work. I urge people on this list who are interested
    > in Chomsky's actual views to read Knowledge of Language:
    > Its Nature, Origin, and Use (1986). It lays out in well-reasoned,
    > non-technical prose the arguments for the E-language/I-language
    > distinction.
    >
            [Mcenery, Tony]
            Of course it is not just computational and generative linguists who
    have different objects of study - as you note yourself linguists focusing on I
    and E language also have different objects of study. Coming back to your point
    about generative linguists and computational linguists having fundamentally
    different objects of study, it is that realisation which has principally, in my
    view, led to computational linguists lining up with corpus linguists. It was
    the needs of those corpus linguists that drove language technology work in the
    eighties away from what one may call cognitively plausible models of language
    towards the development of systems which work in ways largely non-comparable to
    human language processing. The shift to modelling language based on attested
    language use rather than engaging with abstract theorising about idealised
    speaker-hearer pairs was, I believe, the key to progress in natural language
    processing. Beyond the language technology community, however, I would also
    claim that the focus on corpus data by some linguists has also led to more
    practical applications of linguistics than work conducted in the Chomskyan
    paradigm ever will. I know from previous mailings and readings that Chomsky is
    'hands off' about the applications of his work - if others can apply it so be
    it but that is not my aim. However, it may well be the case that the theories
    generated by him have few if any practical applications, though I guess the
    Samsons are now going to tell me all of the practical applications of the
    minimalist paradigm that there are!

            Tony



    This archive was generated by hypermail 2b29 : Sun Apr 08 2001 - 17:30:46 MET DST