Just a note to thank everyone who replied to my query (and to add
one more response):
_____________________________________________________________________
I'm looking to download a collection of tools for extracting
information and formatting corpora prior to running my own programs
on them.
In particular, I'd like to find:
1. A PoS Tagger (also returning person, gender, and number
information),
2. A Sentence Splitter,
3. A Tokenizer,
4. A NP-Extractor,
5. A Parser.
for pre-processing English instruction manuals (I've noticed
that some robust parsers aren't geared for imperative sentences).
If anyone has any recommendations, I'd be delighted to hear them.
I'll post the results as soon as I have them.
_____________________________________________________________________
In case there's anyone else who ISN'T aware of the range of software
being used, a summary of the replies follows.
===A range of tools is available from the ever helpful
Oliver Mason <o.mason@bham.ac.uk>:
____________________________________________________________________
http://www-clg.bham.ac.uk
____________________________________________________________________
===Several respondents (
Chris Brew <Chris.Brew@edinburgh.ac.uk>,
Colin Matheson <colin@cogsci.ed.ac.uk>,
Simone Teufel <simone@cogsci.ed.ac.uk>
)
mentioned the tools available from The University of Edinburgh's
Language Technology Group at:
____________________________________________________________________
http://www.ltg.ed.ac.uk/software
____________________________________________________________________
===Annette Preissner <noemi@dfki.de> indicated software at:
____________________________________________________________________
http://www.lpl.univ-aix.fr/projects/multext/
____________________________________________________________________
===Max Schulze <bschulze@xis.xerox.com> directed me to the tools at
the Xerox Research Center Europe. The contact there is
Ken Beesley.
____________________________________________________________________
Ken.Beesley@xrce.xerox.com
____________________________________________________________________
===Pasi Tapanainen <Pasi.Tapanainen@conexor.fi> indicated that all
but the parser are available (on what looks like a commercial
basis) from:
____________________________________________________________________
http://www.conexor.fi/info-tools.html
____________________________________________________________________
===Atro Voutilainen <voutilai@ling.helsinki.fi> and Pasi Tapanainen
showed me a parser geared for imperative sentences. When testing
the demo, the visual FDG version looked interesting, but the
output seemed to consist of blue and red balls rather than
syntactic symbols. The sample analysis looked good though.
____________________________________________________________________
http://www.conexor.fi/analysers.html
____________________________________________________________________
===Thorsten Brants <thorsten@CoLi.Uni-SB.DE> offered a Part of
Speech tagger at:
____________________________________________________________________
http://www.coli.uni-sb.de/~thorsten/tnt/
____________________________________________________________________
===Marc Light <light@linus.mitre.org> pointed to a site from which
a finite state parser and a stemmer may be downloaded. It's:
____________________________________________________________________
http://www.sfs.nphil.uni-tuebingen.de/~abney
____________________________________________________________________
Thank you one and all,
_____________________________________________
| |
| Richard Evans |
|___________________________________________|
| Computational Linguistics Research Group, |
| School of Languages and European Studies, |
| University of Wolverhampton, |
| UK. |
|___________________________________________|