Corpora: ERGO: Parser Integrity

Anne Sing (annes@htdc.org)
Tue, 20 Jul 1999 15:48:02 -1000

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: eric@scs.leeds.ac.uk: "Re: Corpora: Sensible sizes for specialist corpora"
Previous message: Nicolas Masson: "Corpora: RIAO 2000 Preliminary Announcement"

To the readers.
I am a linguist working at Ergo Linguistic Technologies in Honolulu, HI.
We are currently attempting to refresh and update our collection of parsers
and parser web sites. We currently have the following parsers in our
offices: Davy Temperley, Daniel Sleator, and John Lafferty's "The Link
Grammar Parser" from Carnegie Mellon University, "LFG" from Xerox PARC,
"Apple Pie Parser" from NYU, "ENGCG Constraint Grammar Parser of English"
from Lingsoft, Inc., "The Functional Dependency Grammar Parser" of Atro
Voutilainen and Mikko Silvonen from Finland, Georgetown University's
"Natural Language Processing Parser", Stanford University's "LinGO
Parser", Prospero Software's "Parser Version 1.0 for DOS", "The
FranklinParser" from Proximity Technology, Inc., and "Natural Language
Parser Demo" from The University of Finland's Natural Language Processing
Department. If anyone knows of any other parsers,especially from
universities or high technology development corporations like IBM or
Microsoft, please let me know. We are also looking for software tools
which use parsers as an internal component. We will post a complete list
of these tools and the relevant websites on our homepage on a "related
sites" link. All feedback is welcomed.

The standards by which each of the parsers listed below were judged can be
located at the Ergo Linguistic Technologies website under "parser contest".
Here you will find a full explanation of what Ergo Linguistic Technologies
feels the standards for parsing technology should be. Basically, the
analysis is broken into seven different areas, each having several
objectives which need to be met. The seven categories are as follows:
structural analysis of strings, evaluation of strings, manipulation of
strings, question/answer, statement/response repartee, recognition of the
essential identity of synonymous structures, navigation and control, and
lexicography.

CARNEGIE MELLON UNIVERSITY [LINK GRAMMAR]
The Link Grammar Parser is "a syntactic parser of English, based on link
grammar, an original theory of English syntax." This parser identifies
parts of speech, parts of sentence, internal clauses and sentence type, but
does not identify tense and voice of main clause and internal clauses
(identifies tense only). It recognizes acceptable strings, gives number of
correct parses that succeeded, identifies phrases of acceptable parses, and
gives the number of unacceptable parses that were tried, but does not give
the exact time of the parses in seconds or reject unacceptable strings. It
has no manipulation of strings. It identifies whether a string is a yes/no
question, a wh-question, or a command, but does not have any other
statement/response, question/answer repartee. It demonstrates no
recognition of the essential identity of synonymous structures and
demonstrates no navigation and control functions. The lexicon has 60,000
words and the core vocabulary is suitable to a wide variety of
applications. The parser recognizes single and multi-word items and
recognizes a variety of grammatical features. It does not have tools to
facilitate the addition, modification, or deletion of lexical entries and
it can not mark and link synonyms and classes of lexical items. The output
of this parser is in the form of a tree diagram consisting of a series of
linkages. Each link is marked with Link Grammar's own proprietary labels.
This system was found to be rather hard to follow since at every link one
must refer back to a previous page to uncover the meaning of that
particular link.
http://bobo.link.cs.cmu.edu/grammar/html/intro.html

LINGSOFT, INC. [ENGCG CONSTRAINT GRAMMAR PARSER OF ENGLISH]
This parser, developed at the Department of General Linguistics at the
University of Helsinki, gives a morphological analysis of running English
text. It identifies the parts of speech and parts of the sentence, but
does not identify internal clauses, sentence type or tense and voice of the
main clause or internal clauses. It does identify the phrases of
successful parses, but does not recognize acceptable strings, reject
unacceptable strings, give the correct number of parses that succeeded or
the number of unacceptable parses that were tried. It also does not give
the exact time of parses in seconds. This parser generates no manipulation
of strings. It is capable of identifying whether a string is a statement,
yes/no question, wh-question or a command, but demonstrates no other
question/answer, statement/response repartee. This parser also recognizes
the heads of phrases with and without associated modifiers, but it has no
other recognition of the essential identity of synonymous structures. It
distinguishes commands from questions and statements, but does not
distinguish commands for OS characters or programs, does not provide a
sufficiently detailed analysis of commands to allow proper responses, and
it does not recognize synonymous commands. There is no data available on
the size of the lexicon, but it does recognize single and multi-word items,
recognizes a variety of grammatical features, and has a core vocabulary
that is suitable to a wide variety of applications. However, it can not
mark and link synonyms and classes of lexical items, and it does not have
tools to facilitate the addition, modification, and deletion of lexical
entries. The output of this parser is in the form of a list which provides
a part of speech and part of sentence analysis.
http://www.lingsoft.fi/cgi-pub/engcg

UNIVERSITY OF HELSINKI [FUNCTIONAL DEPENDENCY GRAMMAR PARSER FOR ENGLISH]
This parser gives a surface-syntactic analysis of a running text. This
parser identifies parts of speech and parts of the sentence, but does not
identify internal clauses, sentence type or tense and voice of the main
clause or internal clauses. It does identify the phrases of successful
parses, but does not recognize acceptable strings, reject unacceptable
strings, give the correct number of parses that succeeded or the number of
unacceptable parses that were tried. It also does not give the exact time
of parses in seconds. This parser generates no manipulation of strings,
and has no question/ answer, statement/response repartee. Furthermore, it
does not recognize the essential identity of synonymous structures and
demonstrates no navigation and control functions. No information was
available on the size of the lexicon, but it does recognize single and
multi-word items, a variety of grammatical features, and seems to have a
core vocabulary that is suitable to a wide variety of applications.
However, it does not have any tools to facilitate the addition,
modification, or deletion of lexical entries and it is unable to mark and
link synonyms and classes of lexical items. The output of this parser
provides a part of speech and some part of sentence analysis in the form of
a list.
http://www.ling.helsinki.fi/~tapanain/dg/eng/demo.html

GEORGETOWN UNIVERSITY [NATURAL LANGUAGE PROCESSING PARSER MODULARITY
DEMONSTRATION]
This parser identifies parts of speech and parts of the sentence, but
does not identify internal clauses, sentence type or tense and voice of the
main clause or internal clauses. It does recognize acceptable strings and
reject unacceptable strings, gives the number of correct parses that
succeeded, but not the number of unacceptable parses that were tried. It
also identifies the phrases of acceptable parses and gives the exact time
of parses in seconds. This parser demonstrates no manipulation of strings
or question/answer, statement/response repartee. It also can not recognize
the essential identity of synonymous structures and demonstrates no
navigation and control functions. The lexicon does not contain a minimum
of 50,000 words, but rather has only 23,000 entries. However, it does
recognize single and multi-word items as well as a variety of grammatical
features. It also has tools which facilitate the addition, modification,
and deletion of lexical items. However, its core vocabulary is not
suitable to a wide variety of applications and it is unable to mark and
link synonyms and classes of lexical items. The output of this parser
provides a part of speech analysis for each word in the sentence in the
form of a list.
http://www.georgetown.edu/cgi-bin/compling/slctscr.pl

STANFORD UNIVERSITY [LINGO]
Linguistic Grammars Online or LinGo is a "multi-purpose broad-coverage
grammar of English". This parser identifies parts of speech and tense and
voice of the main clause, but the output from the parse is not very clear.
It does not identify parts of the sentence, internal clauses, the tense and
voice of internal clauses, or sentence type. It recognizes acceptable
strings, unacceptable strings, gives the number of correct parses that
succeeded, and identifies the phrases of successful parses. However, it
does not give the number of unacceptable parses that were tried or the
exact time of parses in seconds. This parser is able to identify tense and
voice in sentences with and without internal clauses, but demonstrates no
other manipulation of strings. It identifies tense in questions, but does
not identify the appropriate tense for responses. It shows no other
question/answer, statement/response repartee. It also shows no recognition
of the essential identity of synonymous structures and demonstrates no
navigation or control functions. There was no information available on the
size of the lexicon, however many words were found in the dictionary. It
recognizes single and multi-word items and recognizes a variety of
grammatical functions. However, it does not have tools to facilitate the
addition, modification, and deletion of lexical entries and it is not able
to mark and link synonyms and classes of lexical items. The output of this
parser provides a part of speech analysis, however it is somewhat hard to
follow and no explanation of the labels were given. Upon corresponding
with Rob Malouf via email about this, I was referred to another web address
containing a document which explains the labels more thoroughly. This
explanation can be found at
ftp://ftp-csli.stanford.edu/linguistics/sag/mrs.ps.gz
http://hpsg.stanford.edu.8000/lingo/parser.html

PROSPERO SOFTWARE [PARSER VERSION 1.0 FOR DOS]
This parser is able to identify parts of speech, but it is not able to
identify parts of a sentence, internal clauses, sentence type, or tense and
voice of the main clause or internal clauses. This parser shows no
evaluation of strings or manipulation of strings. It does identify tense
in questions, but does not identify the appropriate tense for responses.
It demonstrates no other question/answer, statement/response repartee and
demonstrates no recognition of the essential identity of synonymous
structures or navigation and control functions. This parser has a large
dictionary with several hundred thousand entries, well above the suggested
50,000. It recognizes single and multi-word units as well as a variety of
grammatical features. The core vocabulary is suitable to a wide variety of
applications however, the parser does not have tools to facilitate the
addition, modification or deletion of lexical entries and it is unable to
mark and link synonyms and classes of lexical items. This parser's output
provides a part of speech analysis in the form of a list.
http://www.prosperosoftware.com/np1id2.html

PROXIMITY TECHNOLOGY, INC [FRANKLIN PARSER]
This parser can be found in Ken Litkowski's Dictionary Maintenance
Programs also referred to as DIMAP. This parser identifies parts of speech,
parts of a sentence, and internal clauses, but it is not able to identify
sentence type, tense and voice of main and internal clauses. The Franklin
parser does not recognize acceptable strings or reject unacceptable
strings. It also does not give the number of correct parses that succeeded
or the number of unacceptable parses that were tried. It also does not
give the exact time of parses in seconds. However, it does identify the
phrases of successful parses. It does not show any manipulation of
strings, question/answer, statement/response repartee, recognition of the
essential identity of synonymous structures, or navigation and control
functions. The dictionary includes more than 120,000 headwords and the
core vocabulary is suitable to a wide variety of applications. The parser
recognizes single and multi-word items as well as a variety of grammatical
features , but it is not able to mark and link synonyms and classes of
lexical items. This parser's output provides a part of speech and part of
sentence analysis in the form of a chart.
http://proximity.franklin.com/parse.htm

UNIVERSITY OF FINLAND'S NATURAL LANGUAGE PROCESSING DEPT. [NATURAL
LANGUAGE PARSER]
This parser has not been thoroughly examined as of the present.
Preliminary assessments show that the parser identifies parts of speech and
parts of sentence, but does not identify internal clauses, sentence type,
tense and voice of main and internal clauses. It also recognizes
acceptable strings and rejects unacceptable strings. This parser is case
sensitive. It gives the correct number of parses that succeeded, but does
not give the number of unacceptable parses that were tried or the exact
time of parses in seconds. It shows no manipulation of strings,
question/answer, statement/response repartee or recognition of the
essential identity of synonymous structures. The lexicon uses a collection
of dictionaries such as CUOVALD, Word Net, and Link Grammar, so the core
vocabulary is suitable for a wide variety of applications and it recognizes
a variety of grammatical features and single and multi-word items. A more
complete analysis of this parser will be completed in the near future. The
output of this parser provides a part of speech and some part of sentence
analysis in the form of a tree diagram.
http://pointti.vip.fi/nlpd.html

XEROX PARC [LFG PARSER]
This parser is currently undergoing evaluation and a complete analysis
will be posted to our website when it is available. We are in the process
of contacting Xerox PARC for more information about this product.
ftp://ftp.parc.xerox.com/pub/lfg/

NEW YORK UNIVERSITY [APPLE PIE PARSER]
This parser is currently undergoing analysis. When analysis is available,
it will be posted to our website.
http://cs.nyu.edu/cs/projects/proteus/app/index.html

UNIVERSITY OF MANITOBA [MINIPAR]
This parser was downloaded, but the demo was unable to be opened. Our
programmer is currently working on the problem. When an analysis is
available, it will be posted to the website.
http://www.cs.umanitoba.ca/~lindek/minipar/htm

and of course our own parser at ...

ERGO LINGUISTICTECHNOLOGIES
http://www.ergo-ling.com/

For those of you who would like to look at and compare parsers but are
unfamiliar with parsing, you can go to the Ergo web site "Parsing
Contest" page to find good test sentences and a discussion of standards
for comparing parsers. It should take just a few hours to actually
go through, look at and try all these parsers.

Karen Smith
Linguist
Ergo Linguistic Technologies
2800 Woodlawn Dr., Ste. 175
Honolulu, HI 96822

Tel (808) 539-3920
Fax (808) 539 -3924
smithkar@htdc.org
http://www.ergo-ling.com/

Next message: eric@scs.leeds.ac.uk: "Re: Corpora: Sensible sizes for specialist corpora"
Previous message: Nicolas Masson: "Corpora: RIAO 2000 Preliminary Announcement"