[Corpora-List] Summary: frequency list of transformations

From: Marijke Koster (marijke@polderland.nl)
Date: Tue Jan 25 2005 - 09:33:27 MET

  • Next message: Ling Yin: "[Corpora-List] Is there a corpus of procedural documents"

    Dear corpora list members,

    Thank you all for your valuable contributions to my question.

    The suggestion of using the Levenshtein algorithm for this purpose has
    been very valuable. The Levenshtein distance (LD) is a measure of
    similarity between two strings, denoted here by s1 and s2. The distance
    is the number of deletions, insertions or substitutions required to
    transform s1 into s2. The greater the distance, the more different the
    strings are.
    More information can be found at http://www.merriampark.com/ld.htm.
    The Brew edit distance has also been suggested.

    Some of you have sent me a ready-made script (using for example a
    string-edit aligment and a standard diff algorithm) for extracting a
    list of transformations, for which many thanks.

    Some of you were interested in the list of spelling errors and
    corrections. Please let me elaborate.
    In cooperation with the Fryske Akademy, ("The Frisian Academy")
    Polderland has developed the "Fryske TaalHelp" last year. The product is
    a unique combination of a Frisian spellchecker and the electronic
    version of a Frisian - Dutch dictionary, fully integrated in
    Microsoft(r) Office.
    We are now working on a children's version of the Fryske TaalHelp.
    Suggestions offered by the spellchecker will be adapted to the
    children's proficiency level. We have a set of texts written by Frisian
    children (approximately 20,000 words) in which spelling errors are
    tagged as such and in which the correction has been added. This list
    gives us the opportunity to do some research on the sort of errors
    children tend to make. The conclusions will be integrated in the
    spellchecker's suggestion engine.
    I unfortunately cannot share the list with you.

    Thanks for all your help,
    Marijke Koster
    ______________________________________
    Marijke Koster, linguistic engineer
    Polderland Language & Speech Technology BV
    The Netherlands
    http://www.polderland.nl
    Phone: +31.24.352 28 66
    Fax: +31.24.352 28 60



    This archive was generated by hypermail 2b29 : Tue Jan 25 2005 - 09:38:39 MET