The most widely used and conveniently available tagger is one implemented
in Common Lisp by a group at Xerox Parc. It comes with documentation
and works very well. Available from ftp://parcftp.xerox.com/pub/tagger
when I last looked. Latest version is tagger-1.2.tar.Z, which I haven't
tried. If you need to install Common Lisp to run it, there are several
good free implementations.
(cf. The Association of Lisp Users home page
http://www.cs.rochester.edu/users/staff/miller/alu.html)
There is a paper by Briscoe, Grefenstette, Padro and Serail
called "Hybrid Techniques for Training HMM POS Taggers" which includes
work on Spanish. I have an early draft published as a Rank Xerox
Research Centre Reoprt MLTT-007, but it has probably been
published somewhere as well. Contact grefen@xerox.fr for a
copy of the latest version. The paper is listed in but not
directly available from
http://www.xerox.fr/grenoble/mltt/reports/home.html
(non-consitituent coordination, yes!!! PP/pn --> PP/pn and PP/pn,
and I didn't do it on purpose)
>
> 2. Does anybody know of a Spanish corpus marked up for part of speech,
> or even something in the format of a Spanish lexicon, which is available
> on-line for public consumption (To be used, hopefully, in the creation
> of a Spanish tagger)?
The Briscoe et al paper reports a 17k word tagged corpus, and gives
a reference to I. Moreno-Torres (1994) A Morphological
Disambiguation Tool: application to Spanish, Aquilex-II working
Paper 24. Universitat Politechnico de Catalunya. I don't know
if that is publicly available. Please let me know if you find
out anything more.
> Any information which you can supply would be greatly appreciated.
>
Chris
The Language Technology Group of the Human Commuication Research Centre
(a UK ESRC funded interdisciplinary institution spanning several
departments of the Univerisities of Durham,Edinburgh and Glasgow)
provides a free enquiry service for Natural Language Software. More
extensive support and help available by negotiation.
A WWW interface is available on:
http://www.cogsci.ed.ac.uk/~chrisbr/langsoft.html.
This address may change, since we hope to integrate our services
with those of other initiatives in Europe.
------------------------------------------------------------------
Dr Chris Brew,
Language Technology Group,
The University of Edinburgh
Human Communication Research Centre
------------------------------------------------------------------
Email: Chris.Brew@edinburgh.ac.uk
Work Address: HCRC, 2 Buccleuch Place, Edinburgh EH8 9LW
Scotland
Work Telephone: +44 131 650 4631
Work fax: +44 131 650 4587
------------------------------------------------------------------
Home Address: 13 Kilmaurs Road, Edinburgh EH16 5DA
Scotland Home Telephone: (+44 131 662 0574)
------------------------------------------------------------------