Re: [Corpora-List] Part-of-speech tagger

From: Miles Osborne (miles@inf.ed.ac.uk)
Date: Tue Nov 12 2002 - 13:26:03 MET

  • Next message: Chris Brew: "Re: [Corpora-List] Part-of-speech tagger"

    Quoting Afsaneh Fazly <afsaneh@cs.toronto.edu>:

    >
    > Greetings,
    >
    > I need to build a part-of-speech tagger for a new language
    > (for which there is no PoS-tagger available). For this, I need
    > to hand-annotate a minimum amount of text. I would like to know
    > how much text (minimum of course) I need to hand-tag. Also,
    > for this much text, what is the reasonable size of the tagset
    > used for annotation?
    >
    > Regards,
    >
    > Afsaneh
    >
    >
    >

    this is a question about the sample complexity of POS tagging. citeseer is
    overloaded right now, but this link

    http://www.cs.washington.edu/research/jair/volume11/argamon99a.ps

    for

    Shlomo Argamon-Engelson and Ido Dagan (1999) Committee-Based Sample Selection
    for Probabilistic Classifiers, in Journal of Artificial Intelligence Research,1999

    is a good place to look.

    also, at this year's CoNLL, there was a paper on creating a POS tagger in a
    single day:

    http://ilk.kub.nl/~signll/conll02/

    Miles



    This archive was generated by hypermail 2b29 : Tue Nov 12 2002 - 13:36:36 MET