Corpora: approximations (bounds) for edit distance

From: Computer Researcher (compresearch@hotmail.com)
Date: Thu Nov 29 2001 - 19:43:25 MET

  • Next message: Sandra Kuebler: "Corpora: Machine Learning Post at Univ. Tuebingen"

    Hi,

    Does anyone know good approximations (lower and/or upper bounds) to edit
    distance? (by using some statistical numbers that can be found by
    preprocessing of the strings)

    In the preprocess time we can transform the strings to a bunch of numbers
    (e.g., multi-dimensional vectors); and then use these vectors to approximate
    the edit distance between strings.

    I found a paper by Hadlock, F. (1988), proposing a "lower bound" by using
    frequencies of the letters in the string. Assuming that the alphabet is same
    for all strings, all frequency vectors will have same number of dimensions.
    And he defines a distance metric over these vectors, so that this distance
    (in the vector space) is a lower-bound to the actual edit distance.

    Do you know any other method that can achieve a similar goal?

    Thanks for your attention,

    CR

    _________________________________________________________________
    Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp



    This archive was generated by hypermail 2b29 : Thu Nov 29 2001 - 20:16:14 MET