[Corpora-List] Sum: Speech Corpus for Neural Network Training

From: Scott Drellishak (sfd@u.washington.edu)
Date: Tue Aug 24 2004 - 03:18:23 MET DST

  • Next message: chen wenliang: "[Corpora-List] Questions about ACM latex template."

    A few weeks ago, I posted a request for information about speech corpora of
    a particular kind to both the Linguist List and the Corpora-List. This is
    the (somewhat belated) summary.

    I described the corpora we are seeking as follows:

    "We are looking for a corpus that contains samples of many speakers
    producing many vowels (preferably in a less reduced register) that also
    contains human-validated pitch and formant (F1, F2, and F3) tracks and, if
    possible, bandwidth information. A corpus that contains more than just
    vowels is fine, since we can discard sections of the samples that do not
    suit our needs."

    I received five replies:

    1) John Lawler suggested MICASE (Michigan Corpus of Academic
        Spoken English), which is available here:

        http://www.lsa.umich.edu/eli/micase/micase.htm

    2) Lesley Carmichael suggested I post my request to the
        Corpora-List.

    3) Jane Edwards pointed me at the Switchboard Transcription
        Project:

        http://www.icsi.berkeley.edu/real/stp/index.html

    4) Susana Sotillo wrote, "At a recent conference (CALICO) I
        saw a demonstration of the Speechcalator (Allen Blackwell
        and associates). Why don't you write him at Carnegie-
        Mellon."

    5) Linda Bawcom offered an hour and a half of taped
        conversation that she used in her MA research.

    Many thanks to everyone who replied.

    Scott Drellishak
    University of Washington
    Seattle, WA



    This archive was generated by hypermail 2b29 : Tue Aug 24 2004 - 03:59:08 MET DST