Corpora: New Corpus

From: LDC Office (ldc@unagi.cis.upenn.edu)
Date: Wed Mar 08 2000 - 21:37:28 MET

  • Next message: Tony Rose: "Corpora: Job Opportunity in Information Retrieval"

    ********************************************************
    Santa Barbara Corpus of Spoken American English - Part I
    ********************************************************

    LDC is pleased to announce the availability of the
    Santa Barbara Corpus of Spoken American English -
    Part I. This release contains 14 speech files from
    the Santa Barbara Corpus of Spoken American English,
    which was collected by the University of California,
    Santa Barbara Center for the Study of Discourse under
    the direction of John W. Du Bois. Associate Editors
    were Wallace L. Chafe (UCSB), Charles Meyer (UMass,
    Boston), and Sandra A. Thompson (UCSB). The Santa
    Barbara Corpus of Spoken American English is part of
    the International Corpus of English (Charles W.
    Meyer, Director), representing the American Component.

    The Santa Barbara Corpus of Spoken American English
    is based on hundreds of recordings of natural speech
    from all over the United States, representing a wide
    variety of people of different regional origins,
    ages, occupations, and ethnic and social backgrounds.
    It reflects many ways that people use language in
    their lives: conversation, gossip, arguments,
    on-the-job talk, card games, city council meetings,
    sales pitches, classroom lectures, political
    speeches, bedtime stories, sermons, weddings, and
    more.

    Each speech file is accompanied by a transcript in
    which phrases are time stamped with respect to the
    audio recording. Personal names, place names, phone
    numbers, etc, in the transcripts have been altered to
    preserve the anonymity of the speakers and their
    acquaintances and the audio files have been filtered
    to make these portions of the recordings
    unrecognizable.

    For the latest information on this corpus, please refer to
    the UCSB and Linguistic Data Consortium (LDC) web sites
    devoted to it:

            http://humanitas.ucsb.edu/depts/linguistics/research/csae/
            http://www.ldc.upenn.edu/Publications/SBC/

    These sites may also contain software or revised
    versions of data which may be downloaded.

    Institutions that have membership in the LDC during
    the 2000 Membership Year will be able to receive this
    corpus free of charge. Nonmembers may purchase the
    Santa Barbara Corpus of Spoken American English -
    Part I for $75.

    If you would like to order a copy of this corpus,
    please email your request to <ldc@ldc.upenn.edu>. If
    you need additional information before placing your
    order, or would like to inquire about membership in
    the LDC, please send email or call (215) 898-0464.



    This archive was generated by hypermail 2b29 : Wed Mar 08 2000 - 21:38:51 MET