Spanish

FSANCHEZ@ccuam3.sdi.uam.es
Tue, 18 Apr 1995 11:01:24 GMT

Hi,

It is quite difficult to trace and comment all of the points related to
tagging Spanish raised in the last 24 hours. So I will concentrate on
facts.

We have developed a Spanish version of the Xerox Tagger, with some
modifications, mainly in the guesser. Nevertheless, the algorithms
remain as in the original. What is interesting is that we are using a
huge tagset with 466 tags and the system still delivers accuracy rates
similar to those accepted for other languages (95.6 %). The model has
been trained with and is being used to tag the ITU corpus, as part of
the work carried out in the CEC-funded project CRATER. The corpus so
tagged and the tagger itself (including at least an initial lexicon)
will be in the public domain at the end of the project (october this
year). In the next days, I will send a technical report to CMP-LG.

--Fernando

------------------------------------------------------------------------
Fernando Sanchez Leon fsanchez@ccuam3.uam.es
Laboratorio de Linguistica Informatica Voice. +34 1 397 5250
Departamento de Linguistica 4109
Facultad de Filosofia y Letras Fax. +34 1 397 3930
Universidad Autonoma de Madrid
E-28049 Madrid (SPAIN)
------------------------------------------------------------------------