[Corpora-List] New LDC Releases

From: Linguistic Data Consortium (ldc@ldc.upenn.edu)
Date: Tue Jul 01 2003 - 16:36:13 MET DST

  • Next message: Torzec Nicolas ATER LSI: "Re:[Corpora-List] Is the TEI a waste of time?]"

         * LDC2003S01 2001 Communicator Evaluation *

                     * LDC2003T10 SAID *

    The Linguistic Data Consortium (LDC) is pleased to announce the
    availability of two new publications.

    1. The 2001 Communicator Evaluation is the second publication to result
    from the Communicator program. The original goals of the Communicator
    program were to support the creation of speech-enabled interfaces that
    scale gracefully across modalities, from speech-only to interfaces that
    include graphics, maps, pointing and gesture. The original vision of the
    Communicator systems included the ability of a user, during one
    ten-minute session, to plan a three-leg trip, with the three
    flights/legs on three different days, with rental car and hotel in each
    of the two "away" cities, plus dictating/sending a voice-mail message.

    The actual research that led to the data collections in 2000 and 2001
    explored ways to construct better spoken-dialogue systems, with which
    users interact via speech-alone to perform relatively complex tasks such
    as travel planning. During 2000 and 2001 two large data sets were
    collected, in which users used the Communicator systems built several
    sites to do travel planning. The 2001 Communicator Evaluation
    publication consists of all the data from the 2001 collection.

    All audio files have been converted into SPHERE format; there are 53394
    sphere files, totaling approximately 102 hours of audio. All sphere
    files are one-channel, 8KHz, but the sample coding and format, while
    consistent for all files belonging to one site, is not consistent across
    sites (for example, some sites provided pcm, while others provided ulaw
    data). The documentation included in this distribution is replicated
    exactly as received from NIST and from the participating sites. This
    publication consists of one DVD.

    For further information, including online documentation, please visit:

    http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003S01

    Institutions that have membership in the LDC during the 2003
    Membership Year will be able to receive this corpus free of charge.
    Nonmembers may license this corpus for $900

    2. SAID (A Syntactically Annotated Idiom Dataset) provides data for
    investigating the structural configurations in which English idioms are
    typically found. The assumption is that, since idioms are phrasal
    lexical items (PLIs), they will therefore have structural properties
    which are idiosyncratic. In order to study the structural properties of
    phrasal lexical items, the data is more useful if it is syntactically
    annotated.

    The data was originally drawn from four dictionaries of English idioms.
    There are 13467 phrasal lexical items in this corpus. The analysis of
    the phrasal lexical items was manual, while the bracketing symmetry was
    checked computationally. SAID is available through FTP download.

    This corpus was authored by Koenraad Kuiper, Heather McCann, Heidi
    Quinn, Therese Aitchison, Kees van der Veer under the sponsorship of the
    New Zealand Vice Chancellors' Committee and the University of
    Canterbury.

    For further information, including online documentation, please visit:

    http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T10

    Institutions that have membership in the LDC during the 2003
    Membership Year will be able to receive this corpus free of charge.
    Nonmembers may license this corpus for $200.

                                      *

    If you need additional information before placing your order, or
    would like to inquire about membership in the LDC, please send email to
    <ldc@ldc.upenn.edu> or call +1 (215) 573-1275.

    --------------------------------------------------------------------
    Linguistic Data Consortium Phone: +1 (215) 573-1275
    University of Pennsylvania Fax: +1 (215) 573-2175
    3600 Market Street, Suite 810 email: ldc@ldc.upenn.edu
    Philadelphia, PA 19104-2653 www: http://www.ldc.upenn.edu



    This archive was generated by hypermail 2b29 : Tue Jul 01 2003 - 16:41:36 MET DST