Re: Corpora: NLP and Syntax in the Classroom

Philip A. Bralich, Ph.D. (bralich@hawaii.edu)
Tue, 3 Feb 1998 10:49:50 -1000

At 08:21 AM 2/3/98 -1000, Manaris Bill Z wrote:
>I do not doubt that this tool may be useful to some. Many NLP tools are.
>However, based on the verbiage used to promote the tool (quoted below),
>this novel approach sounds all too familiar. I am afraid there is nothing
>new to a system that takes advantage of synonymy in its linguistic model.
>Burton's SOPHIE did this in the 70's. A major issue is overgeneration,
>i.e., dealing with input that the system recognizes when it shouldn't; for
>instance, given the example (of a rule?) provided in the announcement:

The use of synonymy is not what is being pointed out here: rather it
is the ability to use synonymy and parsing to avoid the problems you
discuss below as well as the problems of exponential growth of possible
parses. I recommend you try the BracketDoctor yourself
(http://www.ergo-ling.com) and you will see that we are not talking about
the mere use of synonymy. If we were, we would be demonstrating that every
speech recognition system in the world would handle precisely the amount of
commands that are offered below and they cannot. If you can sign a non-
disclosure agreement, I can get you a copy of our "MemoMaster" which handles
this level of ambiguity to actually send messages without the problem you
site. I agree that wasting band width is not healthy, so I recommend
to you and to all the readers of this list, Before you dismiss these tools
with an intellectualy argument take a look at the actual tools and you will
see for yourself that the arguments do not hold water.

> send/mail/email Bob a message/email/letter/memo/fax (that/which says)
> saying, "meeting at five"
>
>the system would accept the input:
>
> email Bob a email which says saying, "meeting at five"
>
>which makes no sense; even worse, this "overgenerated" sentence could
>actually mean something completely different from what the system developer
>had intended. Put that in a command-and-control situation and you have a
>serious problem.

The overgenerated sentence would not occur to allow any problems.

>Actually, there have been several papers that describe such an approach to
>language modeling/parsing (other than Burton's), so I am somewhat skeptical
>with respect to the novelty of the approach (at least based on the
>examples/discussion provided).

Again synonymy is not what we are offering. The papers sited above have never
resulted in parsers that could actually provide Penn Treebank style
bracketings and trees nor could they provide the actual tools that were
promised. We on the other hand are offering tools for your personal review
that make our claims for us. This is really our main point.

>Again I do not doubt that the system may be useful in certain domains.

I strongly recommend that critics try these tools themselves before suggesting
there are problems. There certainly are problems but they are far fewer and
have far less of an impact than intellectualizing about them might imply.
Certainly, I realize that the 35 years of doubtful achievement in NLP lends
itself to skepticism, but please do not dismiss these tools so easily. Try them
yourselves. I did not send Dr. Manaris the executable in response to this
letter because it requires 1000K of space. However, if he would like it is
available at thesite mentioned above or by writing me at
(bralich@hawaii.edu). MemoMaster which actually does the messaging in the
example is available to
those who can sign an NDA.

Phil Bralich

Philip A. Bralich, Ph.D.
President and CEO
Ergo Linguistic Technologies
2800 Woodlawn Drive, Suite 175
Honolulu, HI 96822

Tel: (808)539-3920
Fax: (808)5393924