[Corpora-List] New LDC Publications

From: Linguistic Data Consortium (ldc@ldc.upenn.edu)
Date: Mon May 19 2003 - 22:40:34 MET DST

  • Next message: PbIKOB_B.B.: "Re: [Corpora-List] corpus transformations info - SUMMARY"

                                 LDC2003S03
              * Korean Telephone Conversations Speech *

                                 LDC2003T08
            * Korean Telephone Conversations Transcripts *

                                 LDC2003L02
              * Korean Telephone Conversations Lexicon *

                                 LDC2003P01
              * Korean Telephone Conversations Complete Set *

    The Linguistic Data Consortium (LDC) is pleased to announce the
    availability of several new publications.

    1. The Korean Telephone Conversations Speech corpus was originally
    recorded as part of the Callfriend project. The conversations were
    collected by the Linguistic Data Consortium primarily in support of the
    Language Identification (LID) project, sponsored by the U.S. Department
    of Defense.

    The Korean Telephone Conversations Speech corpus consists of 100
    telephone conversations between native speakers of Korean. Of these, 49
    were published by the LDC in 1996 as LDC96S54 CALLFRIEND Korean; 51
    conversations are previously unreleased material. The recorded
    conversations last up to 30 minutes.

    There are 100 speech files, totaling approximately 44 hours of audio.
    All speech files are in sphere format (shorten-compressed), recorded in
    2-channel ulaw with a sampling rate of 8 KHz. This publication consists
    of three CD-ROM's.

    For further information, including a link to online documentation,
    please visit:

    http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003S03

    Institutions that have membership in the LDC during the 2003
    Membership Year will be able to receive this corpus free of charge.
    Nonmembers may license this publication for $1000.

    2. The Korean Telephone Conversations Transcripts consists of 100
    transcribed telephone conversations between native speakers of Korean.
    The transcripts correspond to the 100 conversations in Korean Telephone
    Conversations Speech. The recorded conversations last up to 30 minutes,
    of which the transcribed speech covers between 15 to 18 minutes.

    The Korean Telephone Conversations Transcripts contains 100 text files,
    totaling approximately 190K words and 25K unique words. All files are in
    Korean orthography, using the KSC-5601 character set. This publication
    is distributed by ftp.

    For further information, including a link to a sample transcript,
    please visit:

    http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T08

    Institutions that have membership in the LDC during the 2003
    Membership Year will be able to receive this corpus free of charge.
    Nonmembers may license this publication for $1000.

    3. The Korean Telephone Conversations Lexicon consists of 25,251
    words, and contains separate fields with phonological, morphological,
    and frequency information for each word. The lexicon covers the tokens
    occurring in the 100 telephone conversations transcribed and published
    as Korean Telephone Conversations Transcripts. The token coverage is 100%.

    The lexicon contains five tab-separated information fields:

            1. orthographic form in Hangul (headword), encoded in the
                             KSC-5601
               character set.
            2. orthographic form in Yale romanization
            3. pronunciation
            4. frequency of the word in Korean Telephone Conversations
               Transcripts
            5. morphological analysis of the word

    This publication is distributed by ftp.

    For more information, including a link to a sample page from the
    lexicon, please visit:

    http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003L02

    Institutions that have membership in the LDC during the 2003
    Membership Year will be able to receive this corpus free of charge.
    Nonmembers may license this publication for $1500.

    4. The Korean Telephone Conversations Complete Set consists of the
    following:

    LDC2003S03 Korean Telephone Conversations Speech
    LDC2003T08 Korean Telephone Conversations Transcripts
    LDC2003L02 Korean Telephone Conversations Lexicon

    All three of the above publications may be licensed together as a
    package for the nonmember fee of $3000, a savings of $500 off the
    sum of the individual corpora licensing fees.

                                        *

    If you need additional information before placing your order, or
    would like to inquire about membership in the LDC, please send email to
    <ldc@ldc.upenn.edu> or call (215) 573-1275.

    --------------------------------------------------------------------
    Linguistic Data Consortium Phone: (215) 573-1275
    3600 Market Street Fax: (215) 573-2175
    Suite 810 email: ldc@ldc.upenn.edu
    Philadelphia, PA 19104-2653 www: http://www.ldc.upenn.edu



    This archive was generated by hypermail 2b29 : Mon May 19 2003 - 22:47:22 MET DST