Corpora: Text Classification System

From: Gabriela Cavaglia (Gabriela.Cavaglia@itri.brighton.ac.uk)
Date: Thu Jan 17 2002 - 18:27:20 MET

  • Next message: Bruce L. Lambert, Ph.D.: "Re: Corpora: Text Classification System"

    Dear List members,

    Can anyone point me to a free Text Classification system?
    (More details of what I want it for below.)

    Thank you in advance for any help

    Gabriela Cavaglia`
    Phd Student
    ITRI

    Measuring Corpus homogeneity
    =====================================================

    My thesis project is to measure corpus homogeneity. As part of that
    project, I have developed methods for unsupervised classification of
    documents based on text internal evidence. I now want a supervised
    classification system which I can use to evaluate the unsupervised
    classification I have developed.

    To date, the corpus I used for the experiments is made of 107
    documents from the BNC (about 2 million words). The idea is to use the
    BNC Index information and part of the corpus documents to produce a
    training sample and use the rest of the corpus documents as a test
    corpus. I would like to compare the results of the unsupervised
    classification againt those from the supervised classification.
    =====================================================



    This archive was generated by hypermail 2b29 : Thu Jan 17 2002 - 18:34:58 MET