Re: Corpora: Chomsky/Harris

From: Mary D. Taffet (mdtaffet@mailbox.syr.edu)
Date: Mon Apr 02 2001 - 00:29:43 MET DST

  • Next message: Tony Perretta: "Re: Corpora: Chomsky/Harris - one more fun question."

    Hello Steve,

    I thought perhaps I should mention on list what I told Tony privately
    the other day.

    In my mind, when someone mentions parsing and transformations, Brill's
    transformation-based error-driven learning method is what jumps into my
    mind almost immediately.

    If you take a look at his dissertation, he cites Harris quite a bit. It
    would appear that transformation-based error-driven learning is based
    somewhat on Harris's ideas. You can find Brill's dissertation on his
    old homepage (#11):

            http://www.cs.jhu.edu/~brill/acadpubs.html

    And since transformation-based error-driven learning is a robust machine
    learning paradigm that has been successfully used for a number of NLP
    tasks, such as part-of-speech tagging, parsing, prepositional phrase
    attachment, subordinate conjunction attachment, grammatical relation
    extraction and word segmentation, I told Tony that it is probably what
    he is looking for. Algorithms based on transformation-based
    error-driven learning can perform as well as or better than Hidden
    Markov Models.

    I do agree that there is somewhat of a disconnect between theoretical
    linguistics and corpus linguistics, but I also think that this distance
    is being narrowed somewhat as each camp begins to realize that the other
    has useful methods to offer.

    As a person with two degrees in Linguistics (B.A. & M.A.) and almost 10
    years of full-time computer programming experience, I am fortunate to
    feel comfortable in either camp.

    -- Mary D. Taffet
       Syracuse University
       Ph.D. Student/School of Information Studies
       Research Analyst/Center for Natural Language Processing
       4-230 Center for Science & Technology
       Syracuse, NY 13244-4100
       mdtaffet@syr.edu
       

    Steve Seegmiller wrote:
    >
    > This is a reply to a query from Tony Perretta, which
    > a colleague forwarded to me. (I am not a subscriber to
    > this list.)
    >
    > First a point of clarification: Chomsky has never, to my
    > knowledge, "discredited" the use of corpora. There is a
    > bit of a terminological mix-up here, I think, in that
    > Chomsky did attack the idea that a corpus defines a
    > language; i.e. that a grammar should be based solely on
    > the data found in an observed corpus. His point (with
    > which you cannot disagree, if you look at the relevant
    > examples) is that no corpus, no matter how large, can
    > contain every sentence, or even every sentence type,
    > in the language; and furthermore, that many kinds of
    > perfectly good sentences (that the grammar should take
    > into account) have a probability of occurrence in a
    > given corpus that is indistinguishable from zero. The
    > conclusion is that a corpus is never enough.
    >
    > That is quite different from saying that corpora are
    > not useful sources of data. Anyone who has worked with
    > a large corpus has found many many surprises there,
    > including lexical uses and syntactic constructions that
    > s/he would not have thought of otherwise.
    >
    > It is unfortunate that many people in the corpus
    > linguistics community have put themselves in opposition
    > to Chomskyan linguists. (At the recent conference on
    > Corpus Linguistics and Language Teaching in Boston,
    > sevral references were made to "the enemy' at MIT.
    > That is a most unfortunate, and unnecessary, view.)
    > There is no iherent incompatibility between theoretical
    > generative linguistics and corpus linguistics, and
    > by focussing on the enmity, many corpus linguists are
    > making it impossible to discuss the real issues
    > involved.
    >
    > Having said all that, I have a very little information
    > on Harris's approach to parsing and such things. Harris
    > developed, in addition to his transformational analysis,
    > something called tring grammar, which was a non-
    > transformational kind of analysis which encoded certain
    > transformation-type information. It was much
    > easier, in the early days of computatinal linguistics,
    > to program string grammar than transformational grammar,
    > so several of Harris' students adopted string grammar
    > as the basis for parsers, informational retrieval systems,
    > etc. One such project was the String Project at New York
    > University, directed by Naomi Sager. I believe it is still
    > in operation. Another implementation was built by Aravind
    > Joshi. I do not know specifically of any statistical
    > parsers based on Harrisian transformational grammar,
    > but parsing is not my field so there could well be some.
    >
    > Best wishes,
    >
    > Steve Seegmiller, Ph.D.
    > Linguistics Department
    > Montclair State University
    > Upper Montclair, NJ 07043
    >
    > seegmillerm@alpha.montclair.edu
    > http://www.chss.montclair.edu/linguistics/lingpage/faculty/seeg/seeg.htm



    This archive was generated by hypermail 2b29 : Mon Apr 02 2001 - 00:29:15 MET DST