Re: [Corpora-List] Frequency list of transformations

From: Viktor Pekar (v.pekar@wlv.ac.uk)
Date: Fri Jan 21 2005 - 10:39:06 MET

  • Next message: Stefan Th. Gries: "Re: [Corpora-List] Frequency list of transformations"

    Hi Marijke,

    Here is a Perl module that can tell which letters need to be
    removed/inserted/substituted in one word to get the other:
    http://cs.haifa.ac.il/~shlomo/talks/edit_distance/slides/Brew.pm.html

    Viktor

    ----- Original Message -----
    From: "Marijke Koster" <marijke@polderland.nl>
    To: <CORPORA@UIB.NO>
    Sent: Friday, January 21, 2005 8:44 AM
    Subject: [Corpora-List] Frequency list of transformations

    Dear corpora list members,

    Does anyone have a suggestion for a simple method / a script to extract
    a frequency list of transformations from a list of spelling errors and
    corrections?

    For example here's this tab separated list:

    wrong correct
    ----- -------
    occurence occurrence
    occosion occasion
    commputer computer
    live life
    heavie heavy
    geat great
    save safe

    After applying the method it should result in something like this
    1 rr -> r
    1 a -> o
    1 m -> mm
    2 f -> v
    1 y -> ie
    1 r -> ()

    Thanks in advance,
    Marijke Koster
    ______________________________________
    Marijke Koster, linguistic engineer
    Polderland Language & Speech Technology BV
    The Netherlands
    http://www.polderland.nl
    Phone: +31.24.352 28 66
    Fax: +31.24.352 28 60



    This archive was generated by hypermail 2b29 : Fri Jan 21 2005 - 10:40:55 MET