Corpora: Updated Results: NLP Pre-processing Suite

Richard Evans (in6087@wlv.ac.uk)
Thu, 10 Dec 1998 17:28:38 +0000

Hi everyone,

Just a note to thank everyone who replied to my query (and to add
one more response):
_____________________________________________________________________

I'm looking to download a collection of tools for extracting
information and formatting corpora prior to running my own programs
on them.

In particular, I'd like to find:

1. A PoS Tagger (also returning person, gender, and number
information),
2. A Sentence Splitter,
3. A Tokenizer,
4. A NP-Extractor,
5. A Parser.

for pre-processing English instruction manuals (I've noticed
that some robust parsers aren't geared for imperative sentences).

If anyone has any recommendations, I'd be delighted to hear them.
I'll post the results as soon as I have them.
_____________________________________________________________________

In case there's anyone else who ISN'T aware of the range of software
being used, a summary of the replies follows.

===A range of tools is available from the ever helpful
Oliver Mason <o.mason@bham.ac.uk>:
____________________________________________________________________

http://www-clg.bham.ac.uk
____________________________________________________________________

===Several respondents (
Chris Brew <Chris.Brew@edinburgh.ac.uk>,
Colin Matheson <colin@cogsci.ed.ac.uk>,
Simone Teufel <simone@cogsci.ed.ac.uk>
)
mentioned the tools available from The University of Edinburgh's
Language Technology Group at:
____________________________________________________________________

http://www.ltg.ed.ac.uk/software
____________________________________________________________________

===Annette Preissner <noemi@dfki.de> indicated software at:
____________________________________________________________________

http://www.lpl.univ-aix.fr/projects/multext/
____________________________________________________________________

===Max Schulze <bschulze@xis.xerox.com> directed me to the tools at
the Xerox Research Center Europe. The contact there is
Ken Beesley.
____________________________________________________________________

Ken.Beesley@xrce.xerox.com
____________________________________________________________________

===Pasi Tapanainen <Pasi.Tapanainen@conexor.fi> indicated that all
but the parser are available (on what looks like a commercial
basis) from:
____________________________________________________________________

http://www.conexor.fi/info-tools.html
____________________________________________________________________

===Atro Voutilainen <voutilai@ling.helsinki.fi> and Pasi Tapanainen
showed me a parser geared for imperative sentences. When testing
the demo, the visual FDG version looked interesting, but the
output seemed to consist of blue and red balls rather than
syntactic symbols. The sample analysis looked good though.
____________________________________________________________________

http://www.conexor.fi/analysers.html
____________________________________________________________________

===Thorsten Brants <thorsten@CoLi.Uni-SB.DE> offered a Part of
Speech tagger at:
____________________________________________________________________

http://www.coli.uni-sb.de/~thorsten/tnt/
____________________________________________________________________

===Marc Light <light@linus.mitre.org> pointed to a site from which
a finite state parser and a stemmer may be downloaded. It's:
____________________________________________________________________

http://www.sfs.nphil.uni-tuebingen.de/~abney
____________________________________________________________________

Thank you one and all,

_____________________________________________
| |
| Richard Evans |
|___________________________________________|
| Computational Linguistics Research Group, |
| School of Languages and European Studies, |
| University of Wolverhampton, |
| UK. |
|___________________________________________|