Re: Corpora: Tagsets

Rob Freeman (rjfreeman@usa.net)
Tue, 31 Mar 1998 16:49:58 +0900

By "restricted tagset" vs. "refined tagset" do you mean few tags vs.
many tags?
I think the idea is that it is best to have a number of tags equal to
the
number of causual relationships between symbols in the string. Any more
and
they are redundant, any fewer and you miss information. So its all tied
up with
the "entropy" or "degrees of freedom" of the symbols in the string.

Actually I think there is quite a lot of work on this kind of thing. I
think
that "Estimation-Maximization" or "Baum-Walsh Renormalization" of
"Hidden
Markov Models" of symbol strings are all based on these kinds of ideas.
According to my understanding these are all ways of identifying the best
set of
labels for relationships between symbols in a string. Someone who is
more familiar with those techniques might like to comment on that,
though.

Rob Freeman
rjfreeman@usa.net

Meunier Fanny wrote:

> Dear all,
>
> I was wondering whether (or not) studies have been published on the
> comparison of the success rates of POS taggers with a restricted tagset vs
> POS taggers with a refined tagset. Any interesting references would be most
> welcome!
> Thank you very much,