Re: [Corpora-List] What proportion of letter ngrams occur in English?

From: Simon King (Simon.King@ed.ac.uk)
Date: Mon Jan 26 2004 - 10:30:34 MET

  • Next message: Jong-Bok Kim: "[Corpora-List] CFP"

    Bruce L. Lambert, Ph.D. wrote:
    > I am revisiting an issue I brought up to this list several years ago,
    > that is, how many legal/pronounceable strings can be generated from a
    > fixed alphabet for a string of a given length.

    One approach to this might be to consider legal syllables; there are
    strong phonotactic constraints on valid onsets and codas, both on
    allowed sequences and on total number of segments, which mean there are
    only a few thousand allowable syllables in English out of hundreds of
    thousands of possible phoneme sequences.

    Of course, this is not in terms of character strings. But, for made-up
    words like drug names I would guess the letter-to-sound corespondence
    would be much more regular than for real words, so it would still work.

    Simon

    -- 
    Dr. Simon King                               Simon.King@ed.ac.uk
    Centre for Speech Technology Research          www.cstr.ed.ac.uk
    For MSc/PhD info, visit  www.hcrc.ed.ac.uk/language-at-edinburgh
    



    This archive was generated by hypermail 2b29 : Mon Jan 26 2004 - 10:40:30 MET