Re: [Corpora-List] Part-of-speech tagger

From: Rob Freeman (rjfreeman@email.com)
Date: Fri Nov 15 2002 - 01:02:38 MET

  • Next message: Alessandro Lenci: "[Corpora-List] ACL 2003 - Call for Workshop Proposal"

    Hello Afsaneh,

    I was away from my mail for a few days and so missed this thread.

    As others have pointed out you don't _need_ to hand tag anything at all. Of
    course the tags you get at the end of the day are (selected from?) tags the
    algorithm gives you, and there are issues of "which tag is the one true tag".

    Personally I eschew tags altogether as a subjective and largely irrelevant
    generalization of a structure which is much more complex and dynamic than any
    one characterization can portray.

    If you were interested you could try my "classless" parsing algorithm. You
    just need a moderate amount of _untagged_ text which is indexed and then
    sifted for relevant structure at parse time.

    Have a look at my English demo (based on 12 million words of very raw text)
    at:

    http://www.chaoticlanguage.com

    Cheers,

    Rob Freeman

    On Tuesday 12 November 2002 9:52 am, Afsaneh Fazly wrote:
    > Greetings,
    >
    > I need to build a part-of-speech tagger for a new language
    > (for which there is no PoS-tagger available). For this, I need
    > to hand-annotate a minimum amount of text. I would like to know
    > how much text (minimum of course) I need to hand-tag. Also,
    > for this much text, what is the reasonable size of the tagset
    > used for annotation?
    >
    > Regards,
    >
    > Afsaneh



    This archive was generated by hypermail 2b29 : Fri Nov 15 2002 - 01:32:08 MET