Re: Corpora: approximations (bounds) for edit distance

From: Gregory Aist (aist+@andrew.cmu.edu)
Date: Fri Nov 30 2001 - 19:59:15 MET

  • Next message: Malvina Nissim: "Corpora: ESSLLI 2002 Student Session"

    Excerpts from mail: 30-Nov-101 Re: Corpora: approximations.. by Bruce L.
    Lambert@uic.edu
    > Date: Fri, 30 Nov 2001 12:42:54 -0600
    > To: CORPORA@HD.UIB.NO
    > From: "Bruce L. Lambert, Ph.D." <lambertb@uic.edu>
    > Subject: Re: Corpora: approximations (bounds) for edit distance
    >
    > Maybe I'm missing something, but the upper bound on edit distance between
    > two strings is always the length of the longer string, and the lower bound
    > is always zero (when the strings are identical).
    >
    > -bruce
    >

    I suspect CR is looking for
    least upper bounds and greatest lower bounds...

    >
    > At 06:43 PM 11/29/01 +0000, Computer Researcher wrote:
    > >Hi,
    > >
    > >Does anyone know good approximations (lower and/or upper bounds) to edit
    > >distance? (by using some statistical numbers that can be found by
    > >preprocessing of the strings)
    > >
    > >In the preprocess time we can transform the strings to a bunch of numbers
    > >(e.g., multi-dimensional vectors); and then use these vectors to
    > >approximate the edit distance between strings.
    > >
    > >I found a paper by Hadlock, F. (1988), proposing a "lower bound" by using
    > >frequencies of the letters in the string. Assuming that the alphabet is
    > >same for all strings, all frequency vectors will have same number of
    > >dimensions. And he defines a distance metric over these vectors, so that
    > >this distance (in the vector space) is a lower-bound to the actual edit
    > >distance.
    > >
    > >Do you know any other method that can achieve a similar goal?
    > >
    > >Thanks for your attention,
    > >
    > >CR

    Gregory Aist, Ph.D. (Carnegie Mellon, 2000.)
    aist@cs.cmu.edu



    This archive was generated by hypermail 2b29 : Fri Nov 30 2001 - 20:08:26 MET