corpus for text categorization

Margit Hippelein (hippelein@dbag.ulm.daimlerbenz.com)
Mon, 20 Mar 95 14:07:50 +0100

I'm looking for a corpus labeled with respect to text categories of
subtexts.

E.g., if the corpus contains business letters possible text
categories could be "account", "collection letter", "bid", etc.
If the corpus contains technical reports the categories could be
the fields of work, e.g. "telecommunication technique", "computer
science", "mechanical engineering", etc.

There shouldn't be too many categories (e.g., less than 10), and
the subtexts should be of similar length.

The corpus should contain texts in a indoeuropeen language other
than german.

Thanks for any hint!

-----
Margit Hippelein | Email: hippelein@dbag.ulm.DaimlerBenz.COM
Daimler-Benz AG | Tel.: 0731/505-2111
Ulm, Germany | Fax: 0731/505-4113