Corpora: MY LAST NLP POST (OF THIS THREAD)

Philip A. Bralich, Ph.D. (bralich@hawaii.edu)
Thu, 26 Feb 1998 09:49:34 -1000

There have been several more reactions to my post concerning whether or
not the ability of a theory of syntax to be implemented in a programming
langauge constitutes a fair and accurate indepenend and objective test
of a theories scope and efficiency. In order to save bandwidth I
will respond one last time to this thread and try and cover the
widest range of crticisms as possible.

I am sorry to be the one to have to bring to you news of a serious problem
in your field, but the fact remains that the theories that you have grown to
know and love over the last 30+ years have a dirty little secret: They cannot
be programmed to save their lives. This thread has taken on more of
a life than I expected, so if all are agreed I will make this the last post for
this particular thread (though not this subject I am sure). Please do not
see this as an opportunity to let your venom fly as I will respond to
posts that I feel must be responded to. I think it is easiest to frame this
in terms of arguments that are "out there" and my responses to them.

The garden path arguments and my responses:

1. The standards I have proposed have already been met. (They have not).
Not by a long shot. Just print out the standards, put a copy of Ergo
software in your pocket and then go and compare them with any parsing
system anywhere.
2. The standards I propose are idiosyncratic to Ergo's theory or
they are somehow unfair. Look at them yourself and ask if you and most
of the field hasn't believed they are commonplace expectations for
any theory or any parser.
3. Current problems with NLP have to do with working with the last 10%.
That is, the pretense is they can already handle 90% of what needs to be
done but more is required. This is dead wrong. Parsers outside of Ergo
hardly begin to touch the standards we have proposed: few of them doing
anything more than part of speech analysis. If you look at the output on
speech rec systems you will see their NLP abilities are well under 1% of
the task (handling only a few hundred commands). Ergo can improve
that by another 60-80% increasing the number of possible commands
to many thousands, making the first spoken language operating systems
possible.
3. Parsing is not a good test of a theory even though there has never been
a theoretical mechanism proposed that in principle could not be
programmed. Note that other NLP researchers are not anxious to argue
that their theories are better BECAUSE they cannot be programmed. That
would end virtually any hope of funding that may exist for them in the NLP
arena. Thus, I believe it is safe to say that all other syntactic
theoreticians
agree wholeheartedly that programming is a good test of a theory. I have
yet to see one theoretical syntactician to argue this claim. Though it does
seem that there are those in the field who believe parsing is not a good test.
(Statisticians probably--the last thing they would want is for a theory of
syntax to do better than their number crunching). Perhaps syntacticians with
other theories would like to take up the debate. Would a theory of math that
could not be programmed to make calculators then be a better theory of math
because it was using less mundane criteria than formal consistency?
4. Statistics alone is sufficient to analyze the facts of human language: Wrong:
statistics will never provide sufficient information about the internal
structure of strings to manipulate structures or to do
question/answer, statement/response repartee. (Aside: Does a
vote for Ergo equal a vote
against statistics? Perhaps.)
5. People will not accept NLP until disfluencies and other gaps are handled.
This is more than a little bizarre. By this logic speech recognition
should have sold nothing to date and even current products should be
stamped
as not fit for human consumption. Believe me, when you can type or
speak the following to your search engine, people will forget
about the disfluencies
and gaps.
Who was the eighth President of the United States?
Hey Mickey, what time is it?
6. Parsers are too cumbersome to be made readily available to the general
public. Again not true: Ours is a standard Windows 95 program that
fits on
one disk (including the 75,000 word dictionary) and will run on any
486 or better PC. If it is NOT superior to the others they
should be able to do the
same.
7. There is something inherently wrong with the Penn Treebank standard.
Doesn't matter: it is a true demonstration of a parsers ability to do part
of speech tagging as well as to do a thorough analysis of internal
structure. If this is done it Shouldn't take more than a few weeks
for the programmers
to convert their Parser's output into the Penn Treebank style. That
is just
not a big programming task. Besides the Penn Treebank II guidelines
are the standards accepted by this field. (Of course, we also
need equivalent
standards for other languages.
8. Changing one structure into another or doing q&a makes untenable theoretical
claims about the relationships between structures. Again not so--if you have
properly analyzed the internal structure of strings you should be able to
change a question to a statement and a statement to a question
whether or not you believe this is what goes on in the brain. The
structures are so totally
predictable, one from the other, that this too should only take a
programmer a week or so (if the analysis of internal structure has
been done correctly in
the first place).
9. People could respond intelligently to my claims, they are just too busy
with other things or too put off by my arrogance (accuracy?). Wrong: this is
a written record respected in the community and as available as a
library
book (just type my name in a Net Search if you want to find these
arguments): not to respond is to acquiesce.

There is still a serious problem underlying the lack of response from people
who know this field, For syntacticians, if they say that theories can be
tested by their
ability to be implemented as a parser they have to produce a parser of at
least equal uality to the Ergo parser or concede ours is best; however, if
they say that there
are more important issues than parsing (thereby demonstrating their theory
CANNOT be implemented in a parser) they must forever write off funds for
parsing until such
time as they have amended their theory or their opinion.

For statisticians, if they say that a theory of syntax can be parsed at all,
they are
in danger of admitting there is no particular need for statistical parsers.
If they
say that theories of syntax cannot create parsers or cannot create parsers
equal to statistical parsers they must come up with a statistical parser
that can meet or
beat those very ordinary standards that I have proposed. This is especially
difficult for them because there is no way that a statistical parser will
ever analyze internal structure to a significant enough degree to do q&a or
manipulate structures
(otherwise they would have developed a theory of syntax and would once again
remove
the need for statistical parsers).

Finally, download a BracketDoctor (perhaps these arguments as wel), take it to
classes or to presentations or to conferences, and ask questions based on
what it can do. If you are given straight answers with evidence of better
results from other parsers you will KNOW I am wrong. If anything else
occurs (e.g. dead silence, dirty looks, accusations of political
incorrectness, shunning, or whatever) you know there
is substance in my arguments. Gauge my arguments not by the intellectualized
cloudiness of responses, but by the lack or presence of physical evidence
(don't go
by oral reports alone) from other parsers that can meet the standards I have
provided. I have provided very ordinary standards (repeated below) such
that anyone should be
able to judge this. Look closely at the standards; you will see they are
fair and relatively simple. Then, BracketDoctor and arguments in hand, go
out and find the physical evidence yourself.

Phil Bralich

THE STANDARDS:
In addition to using the Penn Treebank II guidelines for the generation of
trees and labeled brackets and a dictionary that is at least 35,000 words in
size and works in real time and handles sentences up to 15 to 20 words in
length, we suggest that NLP parsers should also meet standards in the
following seven areas before being considered "complete." The seven areas
are: 1) the structural analysis of strings, 2) the evaluation of acceptable
strings, 3) the manipulation of strings, 4) question/answer,
statement/response repartee, 5) command and control, 6) the recognition of
the essential identity of ambiguous structures, and 7) lexicography.
(These same criteria have been proposed for the coordination of animations
with NLP with the Virtual Reality Modeling Language Consortium--a consortium
(whose standards were recently accepted by the ISO) designed to standardize
3D environments. (See http://www.vrml.org/WorkingGroups/NLP- ANIM).

It is important to recognize that EAGLES and the MUC conferences, groups that
are charged with the responsibility of developing standards for NLP do not
mention any of the following criteria and instead limit themselves to
largely general characteristics of user acceptance or vague categories such
as "rejects ungrammatical input" rather than specific proposals detailed in
terms of syntactic and grammatical structures and functions that are to be
rejected or accepted. The EAGLES site is made up of hundreds of pages of
introductory material that is very confusing and difficult to navigate;
however, once you actually find the few standards that are being proposed
you will find that they do not come close to the level of precision and
depth that is being proposed here and for that reason should be rejected
until such time as these higher and more demanding levels of expectation of
the NLP systems is included there as well. These are serious matters and a
group like EAGLES should not ignore extant NLP tools simply because they are
not mainstream or because mainstream parsers cannot meet these requirements
(evnthough the Ergo parser is better known than almost all other parsers).
Just go through their pages and try to find EXACTLY what a parser is expected
to do under these guidelines. There is almost no reference to specific
grammatical structures, the Penn Treebank II guidelines, or references to
current working parsers as models (http://www.ilc.pi.cnr.it/EAGLES/home.html).

If the EAGLES' standards are ever to gain any credibility and respect they are
going to have to be far more specific about grammatical and syntactic phenomena
that a system can and cannot support. There should also be some requirement
that the systems being judged offer a demonstration of their abilities to
generate labeled brackets and trees in the style of the Penn Treebank II
guidelines. I suggest the following as a far more exacting and far more
demanding test of systems than is offered by EAGLES or any of the MUC
conferences.

HERE IS A BRIEF PRESENTATION OF STANDARDS IN THOSE SEVEN AREAS:
1. At a minimum, from the point of view of the STRUCTURAL ANALYSIS OF
STRINGS, the parser should:, 1) identify parts of speech, 2) identify parts
of sentence, 3) identify internal clauses (what they are and what their role
in the sentence is as well as the parts of speech, parts of sentence and so on
of these internal clauses), 4) identify sentence type (without using
punctuation), 5) identify tense and voice in main and internal clauses, and 6)
do 1-5 for internal clauses.

2. At a minimum from the point of view of EVALUATION OF STRINGS, the parser
should: 1) recognize acceptable strings, 2) reject unacceptable strings, 3)
give the number of correct parses identified, 4) identify what sort of items
succeeded (e.g. sentences, noun phrases, adjective phrases, etc), 5) give the
number of unacceptable parses that were tried, and 6) give the exact time of
the parse in seconds.

3. At a minimum, from the point of view of MANIPULATION OF STRINGS, the
parser should: 1) change yes/no and information questions to statements and
statements to yes/no and information questions, 2) change actives to passives
in statements and questions and change passives to actives in statements and
questions, and 3) change tense in statements and questions.

4. At a minimum, based on the above basic set of abilities, any such device
should also, from the point of view of QUESTION/ANSWER, STATEMENT/RESPONSE
REPARTEE, he parser should: 1) identify whether a string is a yes/no question,
wh-word question, command or statement, 2) identify tense (and recognize which
tenses would provide appropriate responses, 3) identify relevant parts of
sentence in the question or statement and match them with the needed relevant
parts in text or databases, 4) return the appropriate response as well as any
sound or graphics or other files that are associated with it, and 5) recognize
the essential identity between structurally ambiguous sentences (e.g. recognize
that either "John was arrested by the police" or "The police arrested John"
are appropriate responses to either, "Was John arrested (by the police)" or
"Did the police arrest John?").

5. At a minimum from the point of view of RECOGNITION OF THE ESSENTIAL
IDENTITY OF AMBIGUOUS STRUCTURES, the parser should recognize and associate
structures such as the following: 1) existential "there" sentences with
their non-there counterparts (e.g. "There is a dog on the porch," "A dog is on
the porch"), 2) passives and actives, 3) questions and related statements (e.g.
"What did John give Mary" can be identified with "John gave Mary a book."), 4)
Possessives should be recognized in three forms, "John's house is big," "The
house of John is big," "The house that John has is big," 5) heads of phrases
should be recognized as the same in non-modified and modified versions ("the
tall thin man in the office," "the man in the office," the tall man in the
office" and the tall thin man in the office" should be recognized as referring
to the same man (assuming the text does not include a discussion of another,
"short man" or "fat man" in which case the parser should request further
information when asked simply about "the man")), and 6) others to be decided
by the group.

6. At a minimum from the point of view of COMMAND AND CONTROL, the parser
should: 1) recognize commands, 2) recognize the difference between commands
for the operating system and commands for characters or objects, and 3)
recognize the relevant parts of the commands in order to respond
appropriately.

7. At a minimum from the point of view of LEXICOGRAPHY, the parser should:
1) have a minimum of 50,000 words, 2) recognize single and multi-word lexical
items, 3) recognize a variety of grammatical features such as singular/plural,
person, and so on, 4) recognize a variety of semantic features such as
+/-human, +/-jewelry and so on, 5) have tools that facilitate the addition
and deletion of lexical entries, 6) have a core vocabulary that is suitable
to a wide variety of applications, 7) be extensible to 75,000 words for
more complex applications, and 8) be able to mark and link synonyms.

Philip A. Bralich, Ph.D.
President and CEO
Ergo Linguistic Technologies
2800 Woodlawn Drive, Suite 175
Honolulu, HI 96822

Tel: (808)539-3920
Fax: (808)539-3924