Re: Corpora: How to compute number of syllables in a word

Bill Fisher (william.fisher@nist.gov)
Thu, 18 Feb 1999 16:32:46 -0500

On Feb 18, 2:28pm, Bruce L. Lambert wrote:
> Subject: Corpora: How to compute number of syllables in a word
> As part of my work on phonetic/phonological similarity between drug names,
> I have written a short lisp program to 'predict' how many syllables are in
> a word based on its orthographic representation. (In English, the number of
> vowels is a pretty good predictor, as long as you deal with the exceptions,
> e.g., double vowels, silent vowels at the end of words, etc.) I'm sure I
> must have re-invented the wheel.
>
> What other programs do people know about for computing the number of
> syllables in a word from its orthographic representation?
>
> -bruce

The trick of it is to convert from text to phoneme or phones (TTP),
and there are probably a lot of software packages around to do this for
you. Then you just count the segments marked as syllabic, or the
syllables, if you have a separate tier for them.

One place you can get the code to do it is in the package
"aldistsm-1.1.tar.Z", available via anonymous ftp from me at
jaguar.ncsl.nist.gov in the subdirectory /pub. It's vanilla C
code intended to run under Unix. It does more than you need,
aligning two orthographic word strings to minimize the phonological
distance between corresponding words (including splits and merges),
but you can just use the functions and data you need. It's
general in that when it comes time to get the phonological
representation of a word, if it's not in the dictionary, then
it defaults to using a pretty good set of TTP rules (94.5%
segmental accuracy).

Help yourself.

- Bill F.

-- 
Bill Fisher