Re: Corpora: Locating sources of corpora

From: Christopher Cieri (ccieri@ldc.upenn.edu)
Date: Thu Jul 27 2000 - 19:51:11 MET DST

  • Next message: LDC Office: "Corpora: Membership Renewal"

    Sam,

    In case you have not already done this, you might have a look at LDC's
    Catalog (http://www.ldc.upenn.edu/Catalog). We have 168 corpora
    available at the moment and add about 20 per year. Most of our English
    text corpora focus on news since news text is relatively easy to acquire
    in large volume and covers a variety of topics. LDC also does data
    collection and annotation for specific projects or sponsors provided
    that we retain the right to share the data with our research
    communities.

    Best wishes,
    Chris

    --
    Christopher Cieri
    Executive Director, Linguistic Data Consortium
    3615 Market Street, Philadelphia, PA 19104-2608 USA
    phone: 215-573-5489, fax: 215-573-2175
    mailto:Christopher.Cieri@ldc.upenn.edu
    http://www.ldc.upenn.edu
    

    Sam Chiles wrote:

    > Hello all I am new to the world of Corpora and have recently been > recruited to locate sources of Corpora for a new library in > development by Microsoft. They are currently licensing English > language text data covering any subject to use for linguistic > software, such as grammar checkers. Could anyone give me a few > pointers toward any type of corpora that could be available for use by > Microsoft? Thank youSam Sam Chiles > E-mail sam.chiles@virgin.net




    This archive was generated by hypermail 2b29 : Thu Jul 27 2000 - 19:42:11 MET DST