Re: Spanish

Chris Brew (chrisbr@cogsci.ed.ac.uk)
Mon, 17 Apr 95 16:02:37 +0100

>
>
> > 1. Could someone point me toward some literature, documentation or code

> > (preferably C) relating to part of speech taggers? Specifically, I am
> > hoping to find info on Spanish taggers;
>
> The standard algorithm for part of speech tagging uses Hidden Markov
> Models, and seems to be applicable to a wide variety of languages.
>
> actually *hidden* markov models are generally *not* used.
I was relying on Cutting,Kupiec and Sibun's description in
ANLP92, which starts. "We present an implementation of a
part-of-speech tagger based on a hidden Markov model...".
I probably should have made this clear, since the use
of a hidden Model and untagged corpora makes the Xerox
tagger different from earlier ones which need pre-tagged
input. I was calling this "the standard algorithm", on
the grounds that is implemented in the Xerox, Aquilex
and Multext taggers.

> also, the result is that the number of tags required for a traditional
> style tagger is quite a bit larger than for english.
This is true, of course. This is not necessarily a bad thing,
for reasons which are outlined in David Elworthy's paper
on "Tagset Design and Inflected Languages" in the ACL SIGDAT
workshop in Dublin, cmp-lg/9504002. Sometimes a bigger tagset
will produce better results from smaller corpora. Mostly not,
so in practice Ted will be right most of the time in assuming that
the larger tagset is going to require more training material.
>
> the result of these considerations is that the amount of training
> material is much larger than strictly necessary.
>
> these problems can be largely avoided by tagging with tuples rather
> than single tags and by doing some simple morphology so that each word
> in the source text is considered to be a pair of stem+apparent
> morphology. the tag tuples can contain the gross part of speech
> (noun, verb ...) plus additional information (tense, gender, number).
> the advantage here is that the statistical model being learned can be
> considerably simpler. for instance, where the ending of a particular
> word strongly limits the part of speech of a word, our statistical
> model can learn this fact in a relatively universal manner instead of
> learning this fact over again for each word.
>
> the advantages obtained in this manner can be immense (orders of
> magnitude decrease in the size of the statistical model and amount of
> needed training data).

Do you have published material on this? The same idea occurs as a
tentative conclusion in Elworthy's paper. I think more detail and
numbers would be of general interest.

Chris