[Corpora-List] New Corpora from the LDC

From: LDC Office (ldc@ldc.upenn.edu)
Date: Mon Sep 30 2002 - 19:19:35 MET DST

  • Next message: Nancy Ide: "Re: [Corpora-List] ACL proceedings paper in the American National Corpus"

                 * ACQUAINT English News Text *

        * 2001 NIST Speaker Recognition Evaluation *

    The Linguistic Data Consortium (LDC) is pleased to announce the
    availability of two new corpora.

                               *

    The ACQUAINT English News Text corpus consists of English newswire text,
    drawn from three sources: the Xinhua News Service (People's Republic of
    China), the New York Times News Service, and the Associated Press
    Worldstream News Service. It was prepared by the LDC for the AQUAINT
    Project, and will be used in official benchmark evaluations conducted by
    National Institute of Standards and Technology (NIST).

    This two disc publication contains roughly 375 million words correlating
    to about 3 GB of data. The text data are separated into directories by
    source (apw, nyt, xie); within each source, data files are subdivided by
    year, and within each year, there is one file per date of collection.
     
    For further information, please visit:

    http://www.ldc.upenn.edu/Catalog/LDC2002T31.html

    Institutions that have membership in the LDC during the 2002
    Membership Year will be able to receive this corpus free of charge.
    Nonmembers may purchase this publication for $1000.

                               *

    The 2001 NIST Speaker Recognition Evaluation is part of an ongoing
    series of yearly evaluations conducted by NIST. These evaluations
    provide an important contribution to the direction of research efforts
    and the calibration of technical capabilities. They are intended to be
    of interest to all researchers working on the general problem of text
    independent speaker recognition.

    The single CD-ROM 2001 NIST Speaker Recognition Evaluation corpus is
    based entirely on conversational cellular telephone speech collected by
    the LDC. The files are divided into evaluation and development data.
    There are a total of 2,350 compressed speech files, all of which are
    in SPHERE format.

    For further information, including a link to the 2001 NIST Speaker
    Recognition Evaluation website, please visit:

    http://www.ldc.upenn.edu/Catalog/LDC2002S34.html

    Institutions that have membership in the LDC during the 2002
    Membership Year will be able to receive this corpus free of charge.
    Nonmembers may purchase this publication for $400.

                               *

    If you need additional information before placing your order, or
    would like to inquire about membership in the LDC, please send email to
    <ldc@ldc.upenn.edu> or call (215) 573-1275.

            
    --------------------------------------------------------------------
    Linguistic Data Consortium Phone: (215) 573-1275
    3600 Market Street Fax: (215) 573-2175
    Suite 810 email: ldc@ldc.upenn.edu
    Philadelphia, PA 19104-2653 www: http://www.ldc.upenn.edu



    This archive was generated by hypermail 2b29 : Mon Sep 30 2002 - 19:16:36 MET DST