Corpora: LDC-ELDA: Joint Distribution of LR

From: Magali Duclaux (
Date: Wed Feb 27 2002 - 17:39:16 MET

  • Next message: geoffrey.williams: "Corpora: Extended Deadline: International Workshop on "Computational Approaches to Collocations". Vienna."

    Cooperation Between ELDA and LDC - Distribution of Language Resources

    Networking Data Centers, "Net-DC", (MLIS-5017), aims to improve the
    infrastructure for language resources, by designing and implementing new
    modes of cooperation between the Linguistic Data Consortium (LDC) and
    the European Language Resources Distribution Agency (ELDA). In the
    framework of this cooperation, LDC and ELDA are happy to announce the
    following joint distribution of language resources.

    Translanguage English Database (TED)
    ELRA reference:
    LDC reference:

    The Translanguage English Database (TED) is a corpus of recordings made
    of oral presentations at Eurospeech'93 in Berlin. The corpus name
    derives from the high percentage of oral presentations given in English
    by non-native speakers of English. Two hundred twenty-four (224) oral
    presentations at the conference were successfully recorded, providing a
    total of about 75 hours of speech material. These recordings provide a
    large number of presenters, speaking multiple variants of English, over
    a relatively large amount of time (15 minutes for each presentation + 5
    minutes of discussion), on a specific topic. This release of TED (6
    CDROMs) includes 188 speeches, without the ensuing discussion periods.
    This database was produced with the support of ELSNET. Associated text
    materials consist of ASCII versions of over 400 proceedings papers and
    oral preparations that were supplied by the authors, as well as, 250
    speaker questionnaires.

    Translanguage English Database (TED) Transcripts
    ELRA reference:
    LDC reference:

    The Translanguage English Database (TED) Transcripts corpus contains
    transcriptions of thirty-nine of the 188 speeches of the TED Corpus
    (ELRA ref.:; LDC
    ref.: made at
    Eurospeech'93 in Berlin. The thirty-nine transcripts in this publication
    are in Universal Transcription Format (UTF) and were prepared by the
    LDC. All utf files in the transcript publication were validated against
    an included utf.dtd. Tables containing speaker demographic information
    and a cross-reference of file names from the TED audio corpus are

    For further information, please contact ELRA/ELDA or LDC at:

    55-57 rue Brillat-Savarin
    F-75013 Paris, France
    Tel: +33 01 43 13 33 33
    Fax: +33 01 43 13 33 30
    Email: or

    LDC - Linguistic Data Consortium
    3615 Market Street, Suite 200
    PA 19104-2608 Philadelphia, USA
    Tel: (215) 898-0464
    Fax: (215) 573-2175

    This archive was generated by hypermail 2b29 : Wed Feb 27 2002 - 17:46:58 MET