Corpora: Annotation tool & Arabic POS Respons.

From: Mohamed Noamany (mfn@cs.nmsu.edu)
Date: Wed Jul 18 2001 - 15:43:47 MET DST

  • Next message: sanjayp@xanalys.com: "Corpora: complete lists of English function words?"

    Dear Colleagues,
            Thanks for people who respond. It varies between
    training Brill tagger and the follow E-mail.
    I am resendg it as per request of many persons.
    ***** It seems that I have to start preparing manully tagged set for
    Arabic first which what I intend to do. So, Here comes
    my next question: I sthere any annotation tools that can help
    in tagiing Arabic manually.
    Thanks again,
            MOhamed F. Noamany

    On Tue, 17 Jul 2001, Oliver Mason wrote:

    > Dear Mohamed,
    >
    > I have a language-independent tagger, QTag, which can be trained using a
    > pre-tagged sample text as input. It is implemented in Java, so it should
    > handle Arabic texts alright, though I have never tested it. However, I'm
    > happy to assist you in adapting the tagger to work with Arabic!
    >
    > What you would need to have is either a (machine-readable) lexicon, or a
    > tagged sample text. This can be used to create a resource file for the tagger,
    > which you can then use to tag other (larger) texts with. If you then correct
    > any errors in the tagging you can repeat the process with the new (larger)
    > training set, and you will then end up with fewer errors.
    >
    > In an evaluation with Romanian we used a few 10,000 words as training data
    > and got a rate of about 98+% correct tag assignments.
    >
    > Regards,
    > Oliver Mason
    >
    > --
    > //\\ lecturer | centre for corpus linguistics | dept. of english | school of
    > //\\ humanities | the university of birmingham | edgbaston | birmingham b15
    > \\// 2tt | united kingdom | phone +44(0)121-414-6206 | fax +44(0)121-414- /\
    > \\// 5668 | web http://www.clg.bham.ac.uk | email o.mason@bham.ac.uk \/
    >



    This archive was generated by hypermail 2b29 : Wed Jul 18 2001 - 15:38:59 MET DST