[Corpora-List] POS tagging without training data?

From: Gerhard van Huyssteen (AFNGBVH@puknet.puk.ac.za)
Date: Wed May 21 2003 - 18:38:25 MET DST

  • Next message: Chris Brew: "Re: [Corpora-List] POS tagging without training data?"

    Dear list members,

    We want to develop a POS tagger for Afrikaans. We only have very small
    corpora (roundabout 1,5 million words in total), none of which is
    annotated (with the exception of a tagged lexicon, without any context).
    We're considering adapting an existing tagger for, say, English or
    Dutch, in order to create training data. We want to know:

    (1) What "shell" (e.g. Brill, TnT, TiMBL, TOSCA, etc.) would be the
    most effective/efficient to use to create training data? And how much
    initial training data (i.e. manually tagged data) is needed to do this
    ?
    (2) How much training data is needed to develop a reasonably accurate
    (let's say 95%) version of, for example, a Brill tagger for Afrikaans?

    Thanks in advance for your help. We'll post a summary.

    Yours,
    Gerhard van Huyssteen & Sulene Pilon

    __________________________________________________________
    __________________________***_____________________________
    Dr Gerhard B van Huyssteen
    School for Languages || Potchefstroom University for CHE ||
    POTCHEFSTROOM || 2531 || South Africa
    Skool vir Tale || Potchefstroomse Universiteit vir CHO || POTCHEFSTROOM
    || 2531 || Suid-Afrika

    Tel: +27 18 299 1488
    Fax: +27 18 299 1562
    afngbvh@puknet.puk.ac.za
    __________________________________________________________
    __________________________***_____________________________

    Hierdie boodskap (en aanhangsels) is onderhewig aan beperkings en `n
    vrywaringsklousule. Volledige besonderhede beskikbaar by
    http://www.puk.ac.za/itb/e-pos/disclaimer.html, of by
    itbsekr@puknet.puk.ac.za
    This message (and attachments) is subject to restrictions and a
    disclaimer. Please refer to
    http://www.puk.ac.za/itb/e-pos/disclaimer.html for full details, or at
    itbsekr@puknet.puk.ac.za
    __________________________________________________________
    __________________________***_____________________________



    This archive was generated by hypermail 2b29 : Wed May 21 2003 - 18:38:30 MET DST