Corpora: Tagged german corpus

Karlheinz Everts (
Tue, 09 Sep 1997 08:58:25 -0700

A tagged german corpus has been built up comprising works of the german
author Karl May.

The corpus is completely tagged by:
- word class designator,
- lemma and
- lemma class designator.

The corpus now consists of more than 1.670.000 word forms, but work is
still going on.

Details of construction and results of a number of basic statistical
analysises can be found
under the URL:
" Das Karl-May-Korpus
Ein linguistisch annotiertes Korpus der Werke des Autors Karl May
und einiger seiner Zeitgenossen.
Aufbau und Analysen."

A request for a short sample of the corpus can be sent to the author of
this study:

"Der Tag ist verloren, an dem man nicht gelacht hat."
Karlheinz Everts
Tel.: +49 / (0)2224 / 72599