[Corpora-List] speech corpora

From: Ingo Plag (plag@anglistik.uni-siegen.de)
Date: Thu May 13 2004 - 21:08:30 MET DST

  • Next message: Yuri Tambovtsev: "[Corpora-List] Reading rules in Romanian"

    Dear Corpora Listers,

    I have two queries concerning English speech corpora.

    1. I am looking for a speech corpus (language: English) that is part-of-
    speech tagged and has soundfiles, transcriptions and part-of-speech tags
    aligned. Furthermore, it needs to be of considerable size (> 100,000 word
    tokens, if possible). Can anyone point me towards pertinent corpora?

    So far I only found one corpus that meets all the criteria mentioned
    above, the Boston University Radio News Corpus.

    2. In spite of hour-long efforts and the help of experienced colleagues I
    have not managed to open the example files of the BU Radio News Corpus
    properly, no matter whether I used PRAAT, Wavesurfer, or Transcriber. All
    three programs can open the sound file (.sph) without problems but neither
    of the programs can access the files with the transcription or the part-of-
    speech tags and align this information with the sound wave. Can anyone
    help? Which program(s) can do the job?

    Any help will be greatly appreciated.

    Many thanks in advance!

    Best regards,
    Ingo Plag

    --
    Ingo Plag
    Linguistics Research Center
    University of California at Santa Cruz
    Santa Cruz CA 95060
    USA
    

    plag@anglistik.uni-siegen.de

    phone (+1)-831-459-3823 fax (+1)-831-459-3334 (c/o Junko Ito)

    phone at home: (+1)-831-429-1306



    This archive was generated by hypermail 2b29 : Thu May 13 2004 - 21:23:03 MET DST