RE: [Corpora-List] sorting OHG (non-ASCII) in PERL

From: Alex Murzaku (lists@lissus.com)
Date: Tue Feb 04 2003 - 16:45:24 MET

  • Next message: Jan Strunk: "Re: [Corpora-List] sorting OHG (non-ASCII) in PERL"

    while not claiming to be a "freak" I thought it might help you look at:
         <http://www.perldoc.com/perl5.8.0/lib/Unicode/Collate.html>
    http://www.perldoc.com/perl5.8.0/lib/Unicode/Collate.html
     
    -----Original Message-----
    From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
    Behalf Of Henning Reetz
    Sent: Tuesday, February 04, 2003 9:57 AM
    To: corpora@hd.uib.no
    Subject: [Corpora-List] sorting OHG (non-ASCII) in PERL

    Hi,

    stupid question but perhaps the freaks can help me:

    we're building a database of Old High German words. Obviously, there are
    some characters that are not in ASCII (diacritics like stress marks '
    and carots ^) and chars that do not follow the 'normal' sorting order
    (like 'uu' for 'w'). One possibility would be to recode these chars
    (e.g. get rid off the diacritics for sorting and put them back on in the
    output), but is there a more elegant and general way (e.g. in case one
    would like to have a long 'e' after the short 'e' etc.) so that one
    could use it for other scripts as well (UTF puts chars in an order that
    does not necessarily reflect the 'intuitiv' sequence in a language). -
    Is there a modul to tell PERL which sorting sequence one would like to
    use or do I have to program it myself?

    Thanx for any hints.

    Henning Reetz



    This archive was generated by hypermail 2b29 : Tue Feb 04 2003 - 16:45:29 MET