Corpora: New Corpus

From: LDC Office (ldc@unagi.cis.upenn.edu)
Date: Wed Mar 22 2000 - 23:40:30 MET

  • Next message: Christian Coseru: "Corpora: Corpus for MT"

    **********************************
    BLLIP 1987-89 WSJ Corpus Release 1
    **********************************

    LDC is pleased to announce the availability of a new
    corpus from the Brown Laboratory for Linguistic
    Information Processing (BLLIP):

      The 1987-89 Wall Street Journal (WSJ) Corpus Release 1.

    This two CD-ROM corpus contains a complete,
    Treebank-style parsing of the three-year WSJ archive
    from the ACL/DCI corpus -- about 30 million words of
    text. The parsing and part-of-speech (POS) tagging
    for the entire archive were done using
    statistically-based methods developed by Eugene
    Charniak, Don Blaheta, Niyu Ge, Keith Hall, John Hale
    and Mark Johnson of BLLIP.

    This corpus both overlaps and supplements the
    1-million-word Penn Treebank collection of parsed and
    POS-tagged WSJ texts.

    Institutions that have membership in the LDC during
    the 2000 Membership Year will be able to receive this
    corpus free of charge. Nonmembers may purchase the
    BLLIP 1987-89 WSJ Corpus Release 1 for $100. All
    organizations who wish to receive this corpus must sign
    the BLLIP 1987-89 WSJ Corpus Release 1 license agreement,
    which can be retrieved from:

    http://morph.ldc.upenn.edu/Catalog/mem_agree/bllip.html

    If you would like to order a copy of this corpus,
    please email your request to <ldc@ldc.upenn.edu>. If
    you need additional information before placing your
    order, or would like to inquire about membership in
    the LDC, please send email or call (215) 898-0464.



    This archive was generated by hypermail 2b29 : Wed Mar 22 2000 - 23:41:32 MET