Re: [Corpora-List] syllable corpora

From: Simon G. J. Smith (smithsgj@eee.bham.ac.uk)
Date: Tue Sep 24 2002 - 20:10:52 MET DST

  • Next message: Joerg Schuster: "Re: [Corpora-List] syllable corpora"

    In English, there is no absolute consensus on where syllable boundaries lie, so syllabic segmentation isn't trivial.

    That's not necessarily true of all languages, though; in Chinese, for example, each syllable is represented by one character in the writing system. What is contentious with this language is where the *word* boundaries lie!

    So you might consider using a corpus of Chinese (for example, the CKIP corpus available from www.sinica.edu.tw ). I don't know if you'll find anything in romanized form, so you might need to enlist the help of a Chinese speaker, download Chinese reading software from www.unionway.com , and run the Chinese characters through a Pinyin (romanization) annotator, like http://www.all-day-breakfast.com/chinese/big5-simple.html.

    Let me know how you get on if you try this.



    This archive was generated by hypermail 2b29 : Tue Sep 24 2002 - 20:27:58 MET DST