RE: Corpora: NLP AND THE BEST THEORY OF SYNTAX

Philip A. Bralich, Ph.D. (bralich@hawaii.edu)
Wed, 18 Feb 1998 10:26:25 -1000

I would like to respond to the d'Armond Speers commments in this post. This
post is also rather long. I am trying to snip where I can, but these are
important arguments and I want the reader to see the thread as conveniently
as possible. I also apologize for the length.

At 02:43 AM 2/18/98 -1000, d'Armond Speers wrote:
>> On March 17th I will be giving a talk at the University of Hawaii's
>> Linguistic Department Tuesday Seminar called, "The Best Theory of Syntax."
>> In this talk I intend to make the rather non-controversial point that, the
>> best theory of syntax must necessarily be the one that demonstrates itself
>> to be most completely implemented in a programming language.
>
>I find this point controversial. You say the simplification is justified,
>but you make assumptions about why we study syntax, which may not be
>justified. See below.
>
>> Some might argue that I am merely putting complex arguments into simple
>> language but these arguments have substance and effect in either simple or
>> complex langauge. This is especially true when we are dealing with the
>> application of syntax to a multi-billion dollar industry such as NLP.
>
>I would propose that your claim be revised: "the best theory of syntax *for
>commercial NLP* must necessarily be the one that demonstrates itself to be
>most completely implemented in a programming language." I find this version
>of the claim uncontroversial.

I certainly don't disagree with your more narrow claim, but I also think
the narrowing is unnecessary. The main point of my argument is that the
nature of syntactic rules is such that they can all be implemented in a
program and for that reason the computer is a great testing ground for a
theories scope and efficiency. If you can find rules of grammar that
cannot (in principle) be implemented in a program you have potentially
a problem for my argument, or perhaps a problem for that theory. Just
by looking at phrase structure rules we see the problem somewhat as
in the rules for "the cat sat on the mat":

S --> NP VP
NP --> (det) N
det --> the, a
N --> cat, mat
V --> sat

Not only is there no principaled reason these cannot be programmed, they
actually form the basis of the theory of most compilers (see _Compilers:
Principles, Techniques, and Tools_ by Alfred V. Aho, Ravi Sethi, and
Jeffery D. Ullman. 1986. Addson Wesley). Theories of syntax and
programming langauges are so closely interrelated that computers form
a natural proving ground for theories of syntax whatever their goal.

>> More specifically, I intend to present the arguement that the best
>> independent and objective measure of a theory of syntax' overall
>> effectiveness is its ability to generate, in a computer program, standard
>> grammatical structures and to manipulate these structures in the same way
>> as users of the language being described.
>
>I am confused on this point. What is a "standard" grammatical structure?
>For surely, the grammatical structures used in a theory are
>theory-dependent. Your sentence suggests that every parser should be able
>to generate similar structures, which would basically ignore the essential
>contributions of different theories of syntax. Just compare the
>representations used in relational grammar with those of GB; they
>demonstrate fundamentally different principles.

By "standard" I am referring to all those phenomena of theoretical linguistics
and grammar that every theory must account for: in linguistics it would
be things like crossover phenomena, the that-trace effect, island conditions
and so on and from standard grammar it would refer to the ability to recognize
acceptable and unacceptable passives, actives, tenses, statements and so on.
Just the standard suite of the "what is" of grammar. Note the standards I
propose are based on working with a minimum set of these types of structures.

>I also find it surprising that you claim to know how users of a language
>manipulate grammatical structures. If we knew that, we could all go home!
>Seriously, I believe this claim is guilty of the same simplification as your
>goal of generating "standard" structures; different theories will make
>different assumptions about how structures they are representing are
>manipulated. Theories of syntax are models, tools for understanding complex
>properties of language. I don't think the claim is that these models
>represent derivational or representational strategies used by users of
>language; rather, they are tools to learn about what kinds of theoretical
>principles may account for the use of language.

Again I am referring to a core set of very ordinary phenomena: actives
and passives, tenses, internal clauses and so on. There are grayer areas
certainly but if this core set of very ordinary phenomena has not been
properly handled there is no way a theory is ready for the gray areas.

>> That is, I intend to argue that
>> the best theory of syntax is the one that produces the best parsers.
>
>If the goal of developing theories of syntax were to develop parsers, then
>you could conclude that the best parser (given your measures below)
>represented the best theory of syntax, assuming that all theories of syntax
>had devoted equal resources to developing parsers. I do not agree with the
>implicit assumption that the goal of developing theories of syntax is to
>develop parsers. I would propose that the goal of linguistics is to gain
>insight into the human mind, a mind which has as one of its main
>distinguishing features the ability to learn and use language. One possible
>application of this is for parsers, but I think there are plausible reasons
>for pursuing both applied and theoretical research in linguistics.

My arguments above are the same here: parsers are made of the same stuff
of syntax and therefore are a very natural proving ground for them whatever
ones purpose in making them in the first place. Remember what I am getting
it is this basic core of phenomena that ANY theory MUST handle and then
looking at the theories side by side. If a theory cannot recognize a passive
and an active and a statement and a question, you have to agree that theory
is hardly out the door. Further if a theory can recognize those structures
but cannot change one into the other you certainly cannot say that the theory
has in any sense done a thorough job of analyzing those structures. How can
you say it recognizes an identity of some sort between "John ate the apple'
and "the apple was eaten by John" if it cannot change one into the other.

>> Following that I will present a very ordinary set of standards for the
>> evaluation of parsers and then based on the comparison of theories using
>> those standards, I will argue that the theory of syntax that underlies the
>> Ergo Linguistic Technologies' parser is the best theory of syntax and that
>> all others should be relegated to the scrap heap of "wannabe" theories
>> until such time as they can produce equal or better parsers.
>
>As the "President and CEO" of Ergo Linguistic Technologies, your motives
>become clear. This no longer sounds like a linguistic argument, but a
>commercial.

Yes, granted, but there is this odd reality that I have to deal with: that
is that the one theory of syntax that can do all of this happens to belong
to a commercial rather than an academic instituion.

>> The logic
>> that I will present to support this is:
>>
>> 1) if there is ever to be a way to determine which of the
>> competing, extant theories of syntax is preferable to the others,
>
>The assumption that one *can* identify a single, preferable theory of syntax
>ignores the fact that different theories are developed to explore different
>ways of modeling principles of language. (And not necessarily to explore
>different ways of modeling the same principles of language). What I mean is,
>you need to use the right tool for the right job. If I want to understand
>how relationships between elements may affect structure, I'll study
>Perlmutter. If I want to explore the role of the lexicon, I'll study LFG.
>If I want to find a theory of syntax that is most amenable to computational
>modeling, I'll study Ergo.
>
>Let me state the point in another way. I have a colleague who posits that
>theories of syntax, while useful for developing our understanding of
>language, are less-useful for developing computational systems. There's an
>entire field devoted to exploring the application of mathematical processes
>to language/text processing (and many of these folks are on this list), and
>researchers in this field will probably argue that these statistical
>approaches are equally effective, without requiring any knowledge of
>theories of syntax.

However, this is the reason that the entirety of NLP is currently restricted
to just three areas of large text modelling (quote from the TREC pages):

Document Detection: the capability to locate documents
containing the type of information the user wants from either a text
stream or a store of documents.
Information Extraction: the capability to locate specified
information within a text.
Summarization: the capability to condense the size of a document
or collection while retaining the key ideas in the material

Specifically, these statistical parsers run through mountains of text looking
for key words and then report back. They do little if any parsing. However,
there is absolutely no question that if they actually parsed the sentences
they were searching they could provide a 1000% increase in accuracy and scope.
The ability to parse free text from unrestricted text is still a ways off but
when it is available the statistical parsers will be gone overnight. Or at
best they will become a minor supplement to the parsers. We are looking at
working with such large corpora ourselves, but recognize there will be some
before we have a dicitonary of the required size and before we can turn
ourselves away from the other areas of parsing that clearly offer a much larger
return to our investors (e.g. increasing the number of commands in speech
rec systems from 100's to 1000's).

>Does that mean that language users manipulate
>linguistic structures based upon statistical properties of text? Not
>necessarily. It means that some statistical techniques are well-suited to
>developing certain types of NLP applications. Does that mean that
>statistics is the best theory of syntax? If your premises are sound, then
>you must be prepared to accept this possible conclusion.

Again just note the level of increase in scope and accuracy that will result
when you actually parse the sentences that area being handled by these
statistical analysis devices.

I think central to my assumptions is that if a theory of syntax is complete
it will handel all of the tasks with equal skill. When it comes to
working with language, NOTHING will be better for any of the tasks then
the ability to handle precisely the standards that are outlined in my
previous post. It is very important to look at those standards closely
and ask the question: which theory of syntax, or which applicaiton of
a theory of syntax can be excused from these standards?

>> 2) since computers have the ability to represent and execute
>> binary algorithms, any theory that is composed of binary algorithms should
>> be able to be implemented in a programming language. Thus, any theory of
>> syntax that has reached a level of maturity should be able to represent
>its
>> generalizations in working parsers.
>
>I will not argue this point; I believe every first-year student in
>computational linguistics has experienced this insight. The non sequitur is
>your claim that the quality of a theory of syntax should be measured by the
>performance of a parser. Theories of syntax have other uses, and rejecting
>one based upon the performance of a parser would be whimsical. Perhaps
>computer technology has not yet advanced to the state where we can
>effectively program algorithms that capture principles of language; this
>doesn't mean that the models of language are immature, it means the
>technology is. To discard all other theories of language on this basis
>would be whimsical.

Again I recommend you take a look at rules from a variety of theories
and then ask the question, "Is it possible to implement this in a
program?" The answer is always yes, in principle. Then ask yourself
can this theory meet the basic standards illustrated in my post, and
the answer will be no. This poses the problem that I am

>(And anyway, I don't think there's much danger that
>researchers in this field will utter a collective "eureka" and take up the
>cause of Ergo.)

Me too, but an occasional grad student here and there might apply for
a job with us or ask to do a project with us.

>> 3) the degree to which a theory of syntax and its algorithms
>> cannot be implemented in a programming language is the degree to which
>that >> theory and its algorithms have not been completely or correctly
worked out
>> and should not be considered a mature enough theory to be included in the
>> discussion of which theory is to be preferred.
>
>See above. Perhaps it's your programming language, or algorithms, or other
>factors which have nothing to do with theories of syntax. Now, if you mean
>"preferred" for text processing, that's one thing. But to make general
>conclusions about the "best" theory is pointless.

Not at all. This becomes clear if you have a knowledge of both current
programing languages and current theories of syntax. There is no reason
they should not be working right now.

>Assuming, perhaps, that all theories of syntax had applied equal resources
>to developing parsers. But as I said above, other factors may influence the
>performance of your parser, which do not bear on the sophistication of a
>theory of syntax. You seem to insist that there's a connection between the
>performance of a parser and the maturity of a theory of syntax, and I think
>this assumption is fundamentally wrong.

I don't know how much you believe is required. We had about 2 man years
on two computers to create what we created in C++. If there are grad
students with a background in C++ with a background in a particular
theory this should have been done years ago by most of the theories.
>
>> It is important to recognize that EAGLES and the MUC conferences, groups
>that
>> are charged with the responsibility of developing standards for NLP do not
>> mention any of the following criteria and instead limit themselves to
>> largely general characteristics of user acceptance or vague categories
>such
>> as "rejects ungrammatical input" rather than specific proposals detailed
>in
>> terms of syntactic and grammatical structures and functions that are to be
>> rejected or accepted.
>...
>> If the EAGLES' standards are ever to gain any credibility and respect they
>are
>> going to have to be far more specific about grammatical and syntactic
>> phenomena
>> that a system can and cannot support.
>
>I'm no expert on EAGLES, but a quick visit to their site tells me that their
>standards are for developing components that can be reusable and compatible.
>One part of this is evaluation, but it seems to me that their standards
>*encourage* different theories and methods of implementation, and do not try
>to find the one single "best" one. Since you are trying to use the
>standards in a way inconsistent with their stated goals, I am not surprised
>you find them inadequate.

The standards I propose are so basic that you simply cannot excuse any theory
of syntax from conforming to them in any programming langauge.

>> THE CONCLUSIONS I WILL DRAW FROM THIS ARE:
>> 1) the theory that underlies the software at Ergo Linguistic
>> Technologies
>> is not only the best theory of syntax, but is the ONLY theory of syntax
>that
>> has reached a sufficiently developed state to even attempt the standards
>> described here.
>
>I reject the claim, as I described above, that the best parser represents
>the best theory of syntax. (Recall, statistics is not a theory of syntax.)
>I would accept the claim that the best parser is the best parser, and the
>standards you describe above may be useful as one way of describing "best."

See above. All would prefer full parses of this material. The level of
detail and accuracy would go up tremendously.

>> 2) those who do not mention this theory in their research
>> proposals, grant
>> applications, publications and so on are guilty of negligence (and could
>be
>> sued if there are grants, contracts, jobs, or other such items of material
>> value at stake and where the offerer of these jobs, grants, etc has reason
>> to expect that the applicant is an expert in his field and is providing an
>> accurate picture of the competitive environment).
>
>Ohmigod, I better call my lawyer! I am under no legal obligation to
>recommend any particular system. If I propose that an NLP system is
>appropriate for a task, and an offerer of grants, contracts, jobs or other
>items of material value agrees enough to grant these items, then there's no
>legal negligence. My Ph.D. is an item of material value; if I don't mention
>Ergo in my dissertation, are you going to sue me? My dissertation is about
>linguistics, not parsers. Market your system on the basis of its commercial
>viability and suitability for NLP tasks; don't whine if linguists do not
>accept your theories. If you introduce yourself with such confrontational
>language, people will confront you!

I am venting a bit here. Every one of my investors and contacts puts me
through this wringer. Its just a matter of awareness of one's field--What
is out there and what works for particular tasks. For example, you could
not propose a drug that was half as effective and twice as dangerous as
Prozac and act as if you didn't know Prozac existed. You cannot recommend
to your boss something that is twice as expensive and half as effective
of anything.

>> 3) All current theories of syntax such as Chomsky's latest or even
>> older versions of his theory HPSG, LFG, etc. should all be relegated to
>the
>> scrap heap of "wannabe" systems until such time as they have been worked
>out
>> in sufficient detail to allow the creation of programs that can execute
>their
>> algorithms to the degree required by the above standards.
>
>I cannot even respond to such remarkable statements without serious sarcasm.
>Rather than becoming insulting, I'll just make it clear that I find this
>suggestion comical.

This is merely a logical conclusion based on my previous arguements which
you did not disuade me from: 1) parsers are great tests of theories of
syntax, 2) the standards I propose are so simple that no theory should
be excused from them (and I think everyone has assumed that every theory
CAN handle them) and 3) any theory that cannot meet (1) and (2) simply
is not ready. The fact that those theories cannot handle those very
basic standards is not my fault.

>Only the BracketDoctor and ESE tools are available there. Will you make the
>rest of the downloads available?

Yes, that site anticipates future releases and accomodates private ones.

>> Praise for a competitor's arguments is not likely. Thus, a lack of
>> criticism should be interpreted as acceptance of these arguments.
>
>I'm compelled to argue with you, just so you don't go claim that I agree
>with you!

You are a breath of fresh air as far as I am concerned.
Philip A. Bralich, Ph.D.
President and CEO
Ergo Linguistic Technologies
2800 Woodlawn Drive, Suite 175
Honolulu, HI 96822

Tel: (808)539-3920
Fax: (808)539-3924