[Corpora-List] 2 new treebanks at University of Tübingen

From: Sandra Kübler (kuebler@sfs.uni-tuebingen.de)
Date: Thu Dec 18 2003 - 17:16:16 MET

  • Next message: Yuri Tambovtsev: "[Corpora-List] is verification of language classification necessary?"

    The Division of Computational Linguistics at the Seminar fuer
    Sprachwissenschaft of the University of Tuebingen (Germany) is happy
    to announce the release of two new German language resources:

    1. The Tuebingen Treebank of Written German (TueBa-D/Z)

    The TueBa-D/Z treebank is a manually annotated, German newspaper corpus
    based on data taken from the daily issues of the 'die tageszeitung'
    (taz) ranging from May 3rd to May 7th 1999. The annotation
    scheme distinguishes four levels of syntactic constituency: the
    lexical level, the phrasal level, the level of topological fields, and
    the clausal level. In addition to constituent structure, annotated
    trees contain edge labels between node labels which encode grammatical
    functions.

    The treebank currently comprises approximately 15 000 sentences
    (ca. 260 000 words).

    The license for TueBa-D/Z is granted free of charge for scientific
    use. For more information, please refer to:
    http://www.sfs.uni-tuebingen.de/en_tuebadz.shtml

    2. The Tuebingen Partially Parsed Corpus of Written German (TuePP-D/Z)

    TuePP-D/Z is a collection of articles from the taz newspaper which have
    been automatically annotated with clause structure, topological
    fields, and chunks, in addition to more low level annotation including
    parts of speech and morphological ambiguity classes. All texts are
    processed automatically, starting from paragraph, sentence and token
    segmentation. Tokens include information about some regular types of
    named entities, including dates, telephone numbers, and number/unit
    combinations.

    The TuePP-D/Z data are based on taz newspaper articles from September
    2, 1986 up to May 7, 1999, consisting of more than 200 million word
    tokens.

    The license for TuePP-D/Z is granted at a nominal fee (covering cost
    of DVD and postage) for scientific use. For more information, please
    refer to: http://www.sfs.uni-tuebingen.de/en_tuepp.shtml

    ********************************************************************

    We invite you to visit our web site and browse the resources and tools
    of the SfS:

    http://www.sfs.uni-tuebingen.de/en_nf_asc_resources.shtml



    This archive was generated by hypermail 2b29 : Thu Dec 18 2003 - 17:19:05 MET