RE: [Corpora-List] text categorisation - newspaper

From: Tony Rose (tr@acl.icnet.uk)
Date: Wed Jun 18 2003 - 10:54:12 MET DST

  • Next message: N M Chipere: "[Corpora-List] rare words"

    > We would like to find information about other projects concerning the
    > categorization of newspaper text -- in particular, we are
    > interested in
    > the topic sets that have been used in similar projects. For
    > example, if
    > somebody has the list of topics used in the AP text cat
    > collection, and
    > could send us a copy, that would be extremely useful.

    The Reuters Corpus comes complete with code sets for topics, industries and
    geography, and is freely available from:
    http://about.reuters.com/researchandstandards/corpus/

    > More in general, we would be grateful for any sort of
    > advice/information
    > that seems relevant (e.g., pointers to other text cat work on Italian,
    > etc.)

    And you can find further details of the coding scheme, the
    categorisation/coding process, inter-coder consistency, etc. from here:
    http://about.reuters.com/researchandstandards/corpus/LREC_camera_ready.pdf

    Cheers,
    Tony



    This archive was generated by hypermail 2b29 : Wed Jun 18 2003 - 10:57:35 MET DST