[Corpora-List] New LDC Publications

From: LDC Office (ldc@ldc.upenn.edu)
Date: Wed Nov 13 2002 - 20:43:32 MET

  • Next message: Jean Veronis: "Re: [Corpora-List] prosodically annotated corpora?"

      * Buckwalter Arabic Morphological Analyzer Version 1.0 *

                   * Voicemail Corpus Part II *

                 * 1997 HUB5 German Evaluation *

    The Linguistic Data Consortium (LDC) is pleased to announce the
    availability of three new publications.

    1. The Buckwalter Arabic Morphological Analyzer Version 1.0 was
    created by Tim Buckwalter at Qamus for POS-tagging Arabic text.
    The analyzer consists primarily of three Arabic-English lexicon files:
    prefixes, suffixes, and stems. The lexicons are supplemented by three
    morphological compatibility tables used for controlling prefix-stem
    combinations, stem-suffix combinations, and prefix-suffix combinations.

    The LDC is releasing this software under the GNU General Public License:

    http://www.gnu.org/copyleft/gpl.html

    For information on commercial use, please visit:

    http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002L49

    Buckwalter Arabic Morphological Analyzer can be downloaded for free from
    the above link. If you would like a copy placed on CD-ROM, please note
    that there is a $100 media charge.

    2. The Voicemail Corpus Part II is the second voicemail corpus created
    by Mukund Padmanabhan, Brian Kingsbury et al. at International Business
    Machines. This single disc publication is comprised of speech and
    transcript files, and is separated into training and evaluation data.
    The training data consists of 2048 voicemail messages and the
    corresponding transcript files; the evaluation data consists of 50
    voicemail messages and 50 transcripts.

    For further information, please visit:

    http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002S35

    Institutions that have membership in the LDC during the 2002 Membership
    Year will be able to receive this corpus free of charge. As a 'Members
    Only' publication, the corpus is not available to nonmembers.

    3. The 1997 Hub5 Non-English evaluation is part of an ongoing series of
    periodic evaluations conducted by NIST. These evaluations provide an
    important contribution to the direction of research efforts and the
    calibration of technical capabilities. They are intended to be of
    interest to all researchers working on the general problem of
    conversational speech recognition.

    The Hub5 Non-English evaluation focuses on the task of transcribing
    conversational telephone speech into text. The 1997 HUB5 German
    Evaluation is a single disc publications and contains nine hours of
    speech data. Transcripts are not included.

    For more information, please visit:

    http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002S24

    Institutions that have membership in the LDC during the 2002 Membership
    Year will be able to receive this corpus free of charge. Nonmembers may
    purchase this publication for $1000.

                                *

    If you need additional information before placing your order, or
    would like to inquire about membership in the LDC, please send email to
    <ldc@ldc.upenn.edu> or call (215) 573-1275.

    --------------------------------------------------------------------
    Linguistic Data Consortium Phone: (215) 573-1275
    3600 Market Street Fax: (215) 573-2175
    Suite 810 email: ldc@ldc.upenn.edu
    Philadelphia, PA 19104-2653 www: http://www.ldc.upenn.edu



    This archive was generated by hypermail 2b29 : Wed Nov 13 2002 - 20:47:17 MET