Re: Corpora: Syntactic/Phonologic network?

From: Mike Maxwell (
Date: Wed Jan 23 2002 - 14:36:52 MET

  • Next message: Bernhard Schroeder: "Corpora: CfP ESPP 2001"

    Yuval Feinstein wrote:
    >[Are there]...networks according to
    >phonological information?
    >(e.g..."fish" and "wish" are similar

    A minimized Finite State Automaton (FSA) has some of the properties you
    mention, i.e. they constitute a network based on spelling similarity (or
    phonological similarity, if you spell words "phonemically"). There was an
    article about how minimized FSAs can be constructed in a recent issue of
    Computational Linguistics. However,

    (1) There's no guarantee (and indeed, probably no way) that all phonological
    similarities are captured. For instance, how would you store the
    similarities betweeen "finish" and "fish"? Without a theory (e.g. codas are
    more 'important' than onsets), it would be difficult to decide between two
    similarities, if only one can be represented in the network. (In this case,
    only one of the 'i's of "finish" can correspond to the "i" of "fish".)

    (2) There's no obvious way to extract the similarities that are implicit in
    a minimized FSA, short of asking for the intersection of the FSA with a list
    of regular expressions constructed according to some minimal distance
    algorithm. E.g. if you want to find words similar to "fish", you would have
    to intersect the FSA with expressions like "?fish", "f?ish" etc. (for
    single-point insertions), "?ish", "f?sh" etc. (for single-point
    replacements), "ish", "fsh" etc. (for single-point deletions), and "ifsh",
    fsih" etc. (for metathesis).

    Rhyming dictionaries operate do s.t. like (1), under the theory that codas
    (and, if I recall, stress patterns) are more important than other factors.
    And if you're only interested in English or some other "commercially viable"
    language, spell checkers do something like (2). Of course they're more
    concerned with the kinds of errors that arise from spelling conventions than
    with sound, so e.g. the fact that the 'esh' sound is written in English with
    two letters gives you a possible spelling error ("fsih") that has no basis
    in phonology.

         Mike Maxwell
         Linguistic Data Consortium

    This archive was generated by hypermail 2b29 : Wed Jan 23 2002 - 14:49:11 MET