Corpora: NLP AND THE BEST THEORY OF SYNTAX

Philip A. Bralich, Ph.D. (bralich@hawaii.edu)
Tue, 17 Feb 1998 09:14:28 -1000

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Rafal Uzar: "Corpora: PALC97 Proceedings"
Previous message: Ted Briscoe: "Corpora: CALL FOR PAPERS"
Next in thread: Steffan Corley: "Re: Corpora: NLP AND THE BEST THEORY OF SYNTAX"

To the readers:

On March 17th I will be giving a talk at the University of Hawaii's
Linguistic Department Tuesday Seminar called, "The Best Theory of Syntax."
In this talk I intend to make the rather non-controversial point that, the
best theory of syntax must necessarily be the one that demonstrates itself
to be most completely implemented in a programming language. I am writing
to the group to ask for references, obscure or otherwise, where this basic
proposition has been put forth before in the literature or through personal
communications. Comments, criticism, and discussion of this argument are
also welcome. I will post a summary of the references to the list. (Be
sure and mention if you do not want your name mentioned in the summary).
Some might argue that I am merely putting complex arguments into simple
language but these arguments have substance and effect in either simple or
complex langauge. This is especially true when we are dealing with the
application of syntax to a multi-billion dollar industry such as NLP.

More specifically, I intend to present the arguement that the best
independent and objective measure of a theory of syntax' overall
effectiveness is its ability to generate, in a computer program, standard
grammatical structures and to manipulate these structures in the same way
as users of the language being described. That is, I intend to argue that
the best theory of syntax is the one that produces the best parsers.
Following that I will present a very ordinary set of standards for the
evaluation of parsers and then based on the comparison of theories using
those standards, I will argue that the theory of syntax that underlies the
Ergo Linguistic Technologies' parser is the best theory of syntax and that
all others should be relegated to the scrap heap of "wannabe" theories
until such time as they can produce equal or better parsers. The logic
that I will present to support this is:

1) if there is ever to be a way to determine which of the
competing, extant theories of syntax is preferable to the others, there must
be an independent and objective means of weighing the relative value and
completeness of these theories in terms of their ability to accomplish the
tasks they were originally designed for. Specifically, there must be an
independent and objective means of verifying which theories are indeed most
capable of expressing all and only those generalizations about language that
describe and explain the observed facts of their structure.
2) since computers have the ability to represent and execute
binary algorithms, any theory that is composed of binary algorithms should
be able to be implemented in a programming language. Thus, any theory of
syntax that has reached a level of maturity should be able to represent its
generalizations in working parsers. In fact all programming languages and
compilers are based on early syntactic discoveries like phrase structure
rules and Noam Chomsky is the default reference for much of the early work,
and have already demonstrated their aptness for this sort of comparison.
3) the degree to which a theory of syntax and its algorithms
cannot be implemented in a programming language is the degree to which that
theory and its algorithms have not been completely or correctly worked out
and should not be considered a mature enough theory to be included in the
discussion of which theory is to be preferred.
4) the theory which is most thoroughly worked out will
naturally have the most thorough and comprehensive parsing programs associated
with it, and for that reason is to be considered the best theory of syntax as
determined by this independent, objective criteria.

I will also propose a method for judging which theories have been "best"
implemented in a programming language. Specifically, I will argue that the
standards described below are the minimum standards that a theory of syntax
would have to parse in order to be able say that it had reached some level of
maturity and also this same set of criteria would be used to determine
exactly which theories of syntax had most effectively accomplished the task
of modeling the mechanisms that generate all and only the sentences of a
language. In addition, the comparison of individual parses will of course use
the Penn Treebank II guidelines established by the Linguistic Data Consortium
at the University of Pennsylvania. Of course, any theory of syntax, whatever
its assumptions and methods, should be able to translate its structures into
the Penn Treebank style if their work is thorough and complete. The ability to
generate these labeled brackets and trees in itself constitutes a good test
of a theories maturity.

The motivation for such comparisons and standards is of course to provide an
independent and objective means of evaluation of the merits and relative
success of research in this area that can be judged and discussed not only
by those with a particular theoretical orientation, but also by those with
different theoretical backgrounds, those in different areas of linguistics,
and of course those from fields outside of linguistics who need to evaluate
and discuss such materials.

THE STANDARDS:
In addition to using the Penn Treebank II guidelines for the generation of
trees and labeled brackets and a dictionary that is at least 35,000 words in
size and works in real time and handles sentences up to 15 to 20 words in
length, we suggest that NLP parsers should also meet standards in the
following seven areas before being considered "complete." The seven areas
are: 1) the structural analysis of strings, 2) the evaluation of acceptable
strings, 3) the manipulation of strings, 4) question/answer,
statement/response repartee, 5) command and control, 6) the recognition of
the essential identity of ambiguous structures, and 7) lexicography.
(These same criteria have been proposed for the coordination of animations
with NLP with the Virtual Reality Modeling Language Consortium--a consortium
(whose standards were recently accepted by the ISO) designed to standardize
3D environments. (See http://www.vrml.org/WorkingGroups/NLP- ANIM).

It is important to recognize that EAGLES and the MUC conferences, groups that
are charged with the responsibility of developing standards for NLP do not
mention any of the following criteria and instead limit themselves to
largely general characteristics of user acceptance or vague categories such
as "rejects ungrammatical input" rather than specific proposals detailed in
terms of syntactic and grammatical structures and functions that are to be
rejected or accepted. The EAGLES site is made up of hundreds of pages of
introductory material that is very confusing and difficult to navigate;
however, once you actually find the few standards that are being proposed
you will find that they do not come close to the level of precision and
depth that is being proposed here and for that reason should be rejected
until such time as these higher and more demanding levels of expectation of
the NLP systems is included there as well. These are serious matters and a
group like EAGLES should not ignore extant NLP tools simply because they are
not mainstream or because mainstream parsers cannot meet these requirements
(evnthough the Ergo parser is better known than almost all other parsers).
Just go through their pages and try to find EXACTLY what a parser is expected
to do under these guidelines. There is almost no reference to specific
grammatical structures, the Penn Treebank II guidelines, or references to
current working parsers as models (http://www.ilc.pi.cnr.it/EAGLES/home.html).

If the EAGLES' standards are ever to gain any credibility and respect they are
going to have to be far more specific about grammatical and syntactic phenomena
that a system can and cannot support. There should also be some requirement
that the systems being judged offer a demonstration of their abilities to
generate labeled brackets and trees in the style of the Penn Treebank II
guidelines. I suggest the following as a far more exacting and far more
demanding test of systems than is offered by EAGLES or any of the MUC
conferences.

HERE IS A BRIEF PRESENTATION OF STANDARDS IN THOSE SEVEN AREAS:
1. At a minimum, from the point of view of the STRUCTURAL ANALYSIS OF
STRINGS, the parser should:, 1) identify parts of speech, 2) identify parts
of sentence, 3) identify internal clauses (what they are and what their role
in the sentence is as well as the parts of speech, parts of sentence and so on
of these internal clauses), 4) identify sentence type (without using
punctuation), 5) identify tense and voice in main and internal clauses, and 6)
do 1-5 for internal clauses.

2. At a minimum from the point of view of EVALUATION OF STRINGS, the parser
should: 1) recognize acceptable strings, 2) reject unacceptable strings, 3)
give the number of correct parses identified, 4) identify what sort of items
succeeded (e.g. sentences, noun phrases, adjective phrases, etc), 5) give the
number of unacceptable parses that were tried, and 6) give the exact time of
the parse in seconds.

3. At a minimum, from the point of view of MANIPULATION OF STRINGS, the
parser should: 1) change yes/no and information questions to statements and
statements to yes/no and information questions, 2) change actives to passives
in statements and questions and change passives to actives in statements and
questions, and 3) change tense in statements and questions.

4. At a minimum, based on the above basic set of abilities, any such device
should also, from the point of view of QUESTION/ANSWER, STATEMENT/RESPONSE
REPARTEE, he parser should: 1) identify whether a string is a yes/no question,
wh-word question, command or statement, 2) identify tense (and recognize which
tenses would provide appropriate responses, 3) identify relevant parts of
sentence in the question or statement and match them with the needed relevant
parts in text or databases, 4) return the appropriate response as well as any
sound or graphics or other files that are associated with it, and 5) recognize
the essential identity between structurally ambiguous sentences (e.g. recognize
that either "John was arrested by the police" or "The police arrested John"
are appropriate responses to either, "Was John arrested (by the police)" or
"Did the police arrest John?").

5. At a minimum from the point of view of RECOGNITION OF THE ESSENTIAL
IDENTITY OF AMBIGUOUS STRUCTURES, the parser should recognize and associate
structures such as the following: 1) existential "there" sentences with
their non-there counterparts (e.g. "There is a dog on the porch," "A dog is on
the porch"), 2) passives and actives, 3) questions and related statements (e.g.
"What did John give Mary" can be identified with "John gave Mary a book."), 4)
Possessives should be recognized in three forms, "John's house is big," "The
house of John is big," "The house that John has is big," 5) heads of phrases
should be recognized as the same in non-modified and modified versions ("the
tall thin man in the office," "the man in the office," the tall man in the
office" and the tall thin man in the office" should be recognized as referring
to the same man (assuming the text does not include a discussion of another,
"short man" or "fat man" in which case the parser should request further
information when asked simply about "the man")), and 6) others to be decided
by the group.

6. At a minimum from the point of view of COMMAND AND CONTROL, the parser
should: 1) recognize commands, 2) recognize the difference between commands
for the operating system and commands for characters or objects, and 3)
recognize the relevant parts of the commands in order to respond
appropriately.

7. At a minimum from the point of view of LEXICOGRAPHY, the parser should:
1) have a minimum of 50,000 words, 2) recognize single and multi-word lexical
items, 3) recognize a variety of grammatical features such as singular/plural,
person, and so on, 4) recognize a variety of semantic features such as
+/-human, +/-jewelry and so on, 5) have tools that facilitate the addition
and deletion of lexical entries, 6) have a core vocabulary that is suitable
to a wide variety of applications, 7) be extensible to 75,000 words for
more complex applications, and 8) be able to mark and link synonyms.

THE CONCLUSIONS I WILL DRAW FROM THIS ARE:
1) the theory that underlies the software at Ergo Linguistic Technologies
is not only the best theory of syntax, but is the ONLY theory of syntax that
has reached a sufficiently developed state to even attempt the standards
described here.
2) those who do not mention this theory in their research proposals, grant
applications, publications and so on are guilty of negligence (and could be
sued if there are grants, contracts, jobs, or other such items of material
value at stake and where the offerer of these jobs, grants, etc has reason
to expect that the applicant is an expert in his field and is providing an
accurate picture of the competitive environment). In addition, computational
linguistics departments who do not mention these tools or use tools of this
calibre are remiss in their duty to present the full range of available
materials to their students.
3) All current theories of syntax such as Chomsky's latest or even
older versions of his theory HPSG, LFG, etc. should all be relegated to the
scrap heap of "wannabe" systems until such time as they have been worked out
in sufficient detail to allow the creation of programs that can execute their
algorithms to the degree required by the above standards. (I do not want to
imply that the use of these theories to analyze the worlds' languages cannot
or has not contributed greatly to the store of knowledge about the nature of
the world's langauges. As a matter of fact the theory that we are working
with owes a tremendous debt to all the work that has come before it in the
form of these earlier theories. The only problem is that these other theories
have not yet completed their basic research and have not yet reached a level
of sufficient maturity to work with the standards described above and for that
reason can only be considered works in progress or "wannabe" theories.)

I will finish my UH talk with a demonstration of the software that has been
developed from our theory of syntax focusing on demonstrations from the
seven standards described above and handouts from the output of other
parsers. In addition to our standard demo as seen on our web site
http://www.ergo-ling.com), I will use the tools called "The BracketDoctor"
(a device that generates labeled brackets and trees in the style of the Penn
Treebank II guidelines) "The English Sentence Enhancer" (an ESL grammar
checker) "The Logic Doctor" (a program that handles first order predicate
calculus, syllogistic reasoning, inferrencing and basic logic) and "The
Q&A Demo" ( a program that shows our ability to handle question/answer,
statement/response repartee) to demonstrate our strengths using the Penn
Treebank II style trees and labeled brackets as well as practical
illustrations to demonstrate the abilities of our theory of syntax in those
seven areas. (All these tools except the "Logic Doctor" and the "Q&A Demo"
are available for free download from our web site at
http://www.ergo-ling.com or by email by writing me at bralich@hawaii.edu.
These are Windows 95 programs that fit on one disk and can be installed with
a standard setup function from WIN95.) Please be advised that these
programs are copyrighted and patent pending.

In sum, I would like to know of references and to receive comments in support
of or against the following argument: 1) that computers are the ideal devices
for comparing different theories' abilities to model the phenomena they seek
to describe (all and only the grammatical sentences of a languga); 2) that
any theory that can not be fully implemented in a programming language as
described in the standards outlined above, is flawed in some way; and 3) that
the best independent and objective measure of a theories scope, efficiency,
and effectiveness is the degree to which it can be implemented in a
programming language. (Of course, the basis for judgement will be the Penn
Treebank II guidelines and the standards described above). Then based on the
ability of the Ergo Linguistic's tools to compete in all the standards, I
suggest that the theories of Brame, Chomsky, Kaplan and Bresnen, Pollard and
Sag, Starosta, et al be set aside until such time as they can be shown to
generate programs that are as good or better than those produced at Ergo
Linguistic Technologies' offices.

Phil Bralich

P.S. We recommend that you download these tools and take them with you (on
a lap top is best of course) to any linguistics, NLP, Computational
Linguistics, MT, or logic conference or workshop that will discuss work in
these areas. It should provide you with an interesting source of comparison
material as well as with some interesting and challenging questions for the
presenters. Of course, this may also be of value for students in their
classes. Linguistics and Computer Science departments that are currently
not committed to any particular theory of syntax or approach might want to
consider collaborative involvements with this theory as a means of producing
commercially viable products and as a source of research grants. You may
also wish to compare results in published reports with results that these
tools provide.

You may also want to email copies of one or more of these tools to
classmates, teachers, and co-workers (please avoid sending them to
competitors like a big bunch of unordered pizzas).

P.P.S. As the field of linguistics is dominated by very intelligent, very
informed individuals who are also quite competitive, you can measure
the success of this argument on the field overall by the reactions of the
readers to this post--the smaller the response, the higher the acceptance
(begrudging though it may be). That is, people are certainly willing to
criticize any argument they can, but they merely keep quiet if they cannot.
Praise for a competitor's arguments is not likely. Thus, a lack of
criticism should be interpreted as acceptance of these arguments.

Philip A. Bralich, President
Ergo Linguistic Technologies
2800 Woodlawn Drive, Suite 175
Honolulu, HI 96822
tel:(808)539-3920
fax:(880)539-3924

Philip A. Bralich, Ph.D.
President and CEO
Ergo Linguistic Technologies
2800 Woodlawn Drive, Suite 175
Honolulu, HI 96822

Tel: (808)539-3920
Fax: (808)539-3924

Philip A. Bralich, President
Ergo Linguistic Technologies
2800 Woodlawn Drive, Suite 175
Honolulu, HI 96822
tel:(808)539-3920
fax:(880)539-3924

Philip A. Bralich, Ph.D.
President and CEO
Ergo Linguistic Technologies
2800 Woodlawn Drive, Suite 175
Honolulu, HI 96822

Tel: (808)539-3920
Fax: (808)539-3924

Next message: Rafal Uzar: "Corpora: PALC97 Proceedings"
Previous message: Ted Briscoe: "Corpora: CALL FOR PAPERS"
Next in thread: Steffan Corley: "Re: Corpora: NLP AND THE BEST THEORY OF SYNTAX"