Corpora: shallow parser demo

Jorn Veenstra (veenstra@kub.nl)
Fri, 4 Jun 1999 11:57:46 +0200 (MET DST)

MBSP: Memory-based shallow parsing for English

The Induction of Linguistic Knowledge (ILK) group has put a Memory-Based Shallow
Parser demo online at: http://ilk.kub.nl/ , follow the link: Demos, MBSP.
Please feel free to try and test it, comments are welcome!

==============================================================

Shallow parsing is a useful preprocessing step for many Natural Language
Processing applications. Sentences are then no longer just sequences of words,
but receive some structure: groups of words that closely belong together are
marked, specific relations between (groups of) words are found. In contrast to
full parsing, shallow parsing does not attempt to find a structure comprising
the whole sentence. Therefore, it is in general much faster. The Memory-Based
Shallow Parser (MBSP) applies several modules to an English sentence supplied by
the user. It first assigns a Part-of-Speech to each word in the sentence (see
MBT). In a next step MBSP recognises chunks (non-overlapping, non-embedded
constituents). Finally, MBSP assigns subjects and objects to the verbal chunks
in the sentence. MBSP is trained on the Wall Street Journal (WSJ) treebank, a
link to more recent WSJ material is included.

We'll present a paper about Memory-based shallow parsing at the CoNLL in Bergen
next week.