Re: Corpora: Ego et al

Philip A. Bralich, Ph.D. (bralich@hawaii.edu)
Mon, 23 Feb 1998 09:01:18 -1000

At 08:42 PM 2/22/98 -1000, David Coniam wrote:
>I have to agree very much with Judith Klavans's suggestion of indivs writing
>to Ego and then a summary, say once a month, being posted.
>
>We are in danger, I feel, of being McDonalds-ed - I mean taking on as a
>household name something whose quality simply does not merit it, as I
>believe Ego's attempts at analysis (or lack of analysis in the majority of
>cases I gave it) illustrated.

The question of quality is again a problem for the entire industry. While
we do not get 90% of any sentence anyone will throw at us we are doing
100s of time better than all the competitors. For example, current speech
recognition technology allows you to use a few hundred short commands with
their systems. By the addition of our tools that goes up to many 1000s
of commands. And ours is the only system that can do that. Here is an
example of what I mean. We can parse all of the combinations in the
following just be adding 1000k of code to a speech rec system.

(could/would/can/will) (you)(please) open/get/grab/find the file/document
called/named (that/which/0 I called/named) (that/which is called/named)
555.doc.

Currrently reviewers of speech rec would rave if you had just a few
hundred more commands than your competitor. Imagine their delight when
this becomes available. To toss David's sentences at any speech rec system
would have far less results than with ours. The increase is what is important.
It is not wise to toss out bi-planes because they do not have the ability
of jets. They are still far better than the hot air balloons with nothing
to show in comparison to the bi-planes.

>I would suggest that those who are in agreement with Judith's suggestion
>simply do that. And thats where we let things lie ...

Especially those who cannot argue the point you mean. One thing that happens
is that you inadvertently feed those who believe that those who are not
responding are acquiescing in the face of a better argument.

In addition I suggest that you all look closely at the standards that I have
been proposed (appended to the end of this message) and ask yourself these
two questions.

Hadn't I assumed that all parsers can do this already?
If they cannot, why not?
If only Ergo can meet these standards, what does it mean for the field?

Phil Bralich

>HERE IS A BRIEF PRESENTATION OF STANDARDS IN SEVEN AREAS:
>1. At a minimum, from the point of view of the STRUCTURAL ANALYSIS OF
>STRINGS, the parser should:, 1) identify parts of speech, 2) identify parts
>of sentence, 3) identify internal clauses (what they are and what their role
>in the sentence is as well as the parts of speech, parts of sentence and so on
>of these internal clauses), 4) identify sentence type (without using
>punctuation), 5) identify tense and voice in main and internal clauses, and 6)
>do 1-5 for internal clauses.
>
>2. At a minimum from the point of view of EVALUATION OF STRINGS, the parser
>should: 1) recognize acceptable strings, 2) reject unacceptable strings, 3)
>give the number of correct parses identified, 4) identify what sort of items
>succeeded (e.g. sentences, noun phrases, adjective phrases, etc), 5) give the
>number of unacceptable parses that were tried, and 6) give the exact time of
>the parse in seconds.
>
>3. At a minimum, from the point of view of MANIPULATION OF STRINGS, the
>parser should: 1) change yes/no and information questions to statements and
>statements to yes/no and information questions, 2) change actives to passives
>in statements and questions and change passives to actives in statements and
>questions, and 3) change tense in statements and questions.
>
>4. At a minimum, based on the above basic set of abilities, any such device
>should also, from the point of view of QUESTION/ANSWER, STATEMENT/RESPONSE
>REPARTEE, he parser should: 1) identify whether a string is a yes/no question,
>wh-word question, command or statement, 2) identify tense (and recognize which
>tenses would provide appropriate responses, 3) identify relevant parts of
>sentence in the question or statement and match them with the needed relevant
>parts in text or databases, 4) return the appropriate response as well as any
>sound or graphics or other files that are associated with it, and 5) recognize
>the essential identity between structurally ambiguous sentences (e.g. recognize
>that either "John was arrested by the police" or "The police arrested John"
>are appropriate responses to either, "Was John arrested (by the police)" or
>"Did the police arrest John?").
>
>5. At a minimum from the point of view of RECOGNITION OF THE ESSENTIAL
>IDENTITY OF AMBIGUOUS STRUCTURES, the parser should recognize and associate
>structures such as the following: 1) existential "there" sentences with
>their non-there counterparts (e.g. "There is a dog on the porch," "A dog is on
>the porch"), 2) passives and actives, 3) questions and related statements (e.g.
>"What did John give Mary" can be identified with "John gave Mary a book."), 4)
>Possessives should be recognized in three forms, "John's house is big," "The
>house of John is big," "The house that John has is big," 5) heads of phrases
>should be recognized as the same in non-modified and modified versions ("the
>tall thin man in the office," "the man in the office," the tall man in the
>office" and the tall thin man in the office" should be recognized as referring
>to the same man (assuming the text does not include a discussion of another,
>"short man" or "fat man" in which case the parser should request further
>information when asked simply about "the man")), and 6) others to be decided
>by the group.
>
>6. At a minimum from the point of view of COMMAND AND CONTROL, the parser
>should: 1) recognize commands, 2) recognize the difference between commands
>for the operating system and commands for characters or objects, and 3)
>recognize the relevant parts of the commands in order to respond
>appropriately.
>
>7. At a minimum from the point of view of LEXICOGRAPHY, the parser should:
>1) have a minimum of 50,000 words, 2) recognize single and multi-word lexical
>items, 3) recognize a variety of grammatical features such as singular/plural,
>person, and so on, 4) recognize a variety of semantic features such as
>+/-human, +/-jewelry and so on, 5) have tools that facilitate the addition
>and deletion of lexical entries, 6) have a core vocabulary that is suitable
>to a wide variety of applications, 7) be extensible to 75,000 words for
>more complex applications, and 8) be able to mark and link synonyms.
>
Philip A. Bralich, Ph.D.
President and CEO
Ergo Linguistic Technologies
2800 Woodlawn Drive, Suite 175
Honolulu, HI 96822

Tel: (808)539-3920
Fax: (808)539-3924