[Corpora-List] sorting OHG (non-ASCII) in PERL

From: Henning Reetz (henning.reetz@uni-konstanz.de)
Date: Tue Feb 04 2003 - 15:56:41 MET

  • Next message: Lars Nygaard: "Re: [Corpora-List] sorting OHG (non-ASCII) in PERL"

    Hi,

    stupid question but perhaps the freaks can help me:

    we're building a database of Old High German words. Obviously, there
    are some characters that are not in ASCII (diacritics like stress
    marks ' and carots ^) and chars that do not follow the 'normal'
    sorting order (like 'uu' for 'w'). One possibility would be to recode
    these chars (e.g. get rid off the diacritics for sorting and put them
    back on in the output), but is there a more elegant and general way
    (e.g. in case one would like to have a long 'e' after the short 'e'
    etc.) so that one could use it for other scripts as well (UTF puts
    chars in an order that does not necessarily reflect the 'intuitiv'
    sequence in a language). - Is there a modul to tell PERL which
    sorting sequence one would like to use or do I have to program it
    myself?

    Thanx for any hints.

    Henning Reetz

    -- 
    

    Department of Linguistics University of Konstanz Fach D186 78457 Konstanz Germany email: henning.reetz@uni-konstanz.de



    This archive was generated by hypermail 2b29 : Tue Feb 04 2003 - 15:57:01 MET