Corpora: Chomsky/Harris

From: Steve Seegmiller (seegmillerm@alpha.montclair.edu)
Date: Sun Apr 01 2001 - 20:46:18 MET DST

  • Next message: Mary D. Taffet: "Re: Corpora: Chomsky/Harris"

    This is a reply to a query from Tony Perretta, which
    a colleague forwarded to me. (I am not a subscriber to
    this list.)

    First a point of clarification: Chomsky has never, to my
    knowledge, "discredited" the use of corpora. There is a
    bit of a terminological mix-up here, I think, in that
    Chomsky did attack the idea that a corpus defines a
    language; i.e. that a grammar should be based solely on
    the data found in an observed corpus. His point (with
    which you cannot disagree, if you look at the relevant
    examples) is that no corpus, no matter how large, can
    contain every sentence, or even every sentence type,
    in the language; and furthermore, that many kinds of
    perfectly good sentences (that the grammar should take
    into account) have a probability of occurrence in a
    given corpus that is indistinguishable from zero. The
    conclusion is that a corpus is never enough.

    That is quite different from saying that corpora are
    not useful sources of data. Anyone who has worked with
    a large corpus has found many many surprises there,
    including lexical uses and syntactic constructions that
    s/he would not have thought of otherwise.

    It is unfortunate that many people in the corpus
    linguistics community have put themselves in opposition
    to Chomskyan linguists. (At the recent conference on
    Corpus Linguistics and Language Teaching in Boston,
    sevral references were made to "the enemy' at MIT.
    That is a most unfortunate, and unnecessary, view.)
    There is no iherent incompatibility between theoretical
    generative linguistics and corpus linguistics, and
    by focussing on the enmity, many corpus linguists are
    making it impossible to discuss the real issues
    involved.

    Having said all that, I have a very little information
    on Harris's approach to parsing and such things. Harris
    developed, in addition to his transformational analysis,
    something called tring grammar, which was a non-
    transformational kind of analysis which encoded certain
    transformation-type information. It was much
    easier, in the early days of computatinal linguistics,
    to program string grammar than transformational grammar,
    so several of Harris' students adopted string grammar
    as the basis for parsers, informational retrieval systems,
    etc. One such project was the String Project at New York
    University, directed by Naomi Sager. I believe it is still
    in operation. Another implementation was built by Aravind
    Joshi. I do not know specifically of any statistical
    parsers based on Harrisian transformational grammar,
    but parsing is not my field so there could well be some.

    Best wishes,

    Steve Seegmiller, Ph.D.
    Linguistics Department
    Montclair State University
    Upper Montclair, NJ 07043

    seegmillerm@alpha.montclair.edu
    http://www.chss.montclair.edu/linguistics/lingpage/faculty/seeg/seeg.htm



    This archive was generated by hypermail 2b29 : Sun Apr 01 2001 - 23:30:21 MET DST