Corpora: Looking for syntactically parsed corpora in English, French, and German

From: Rene.Valdes@lhsl.com
Date: Wed Aug 01 2001 - 23:32:51 MET DST

  • Next message: Rene.Valdes@lhsl.com: "Corpora: Looking for syntactically parsed corpora in English, French, and German"

    Both parsers were developed using data available from the Penn Treebank, a
    syntactically tagged corpus which includes the Wall Street Journal (WSJ)
    Penn Treebank Corpus and the Penn Treebank Brown Corpus.

    The Penn Treebank Project annotates naturally-occuring text for linguistic
    structure. Most notably, we produce skeletal parses showing rough syntactic
    and semantic information -- a bank of linguistic trees. We also annotate
    text with part-of-speech tags, and for the Switchboard corpus of telephone
    conversations, dysfluency annotation. We are located in the LINC Laboratory
    of the Computer and Information Science Department at the University of
    Pennsylvania.
    All data produced by the Treebank is released through the Linguistic Data
    Consortium



    This archive was generated by hypermail 2b29 : Wed Aug 01 2001 - 23:32:10 MET DST