Corpora: corpus testing

From: Paul Llido (pllideau@yahoo.com)
Date: Tue Oct 16 2001 - 19:47:36 MET DST

  • Next message: Joel Kuipers: "Corpora: Corpus based investigation of presupposition change"

    Hello Corpus list,

    I have gotten a certain volume of email messages for
    my *corpus* (30,000 sentences about specific software
    support). I'd like to know whether:

    1. this is a workable size?
    2. this size is useful for training the Brill tagger?
    3. I can build a gold standard
       out of it for testing?
    4. the size is enough for a supervised test
       and unsupervised test.

    I'd also would like to ask for advice on testing.
    As far as I know, one first creates a gold standard
    and then batches the data into supervised and
    unsupervised sections. Is this all there is to
    the material preparation of the testing, that is,
    excluding the statistical measures part and what I
    should be testing for?

    I'll post the replies...

    Many thanks,
    Paul Llido

    =====
    **********************************************************
    ************************************* *** Paul C Llido ***
    ** quae sursum sunt quaerite ****** pllideau@yahoo.com ***
    **********************************************************

    __________________________________________________
    Do You Yahoo!?
    Make a great connection at Yahoo! Personals.
    http://personals.yahoo.com



    This archive was generated by hypermail 2b29 : Tue Oct 16 2001 - 20:09:53 MET DST