Corpora: Diacritics, Unicode

From: Tadeusz Piotrowski (tadpiotr@plusnet.pl)
Date: Sat Apr 21 2001 - 11:27:20 MET DST

  • Next message: Jem Clear: "Corpora: Diacritics and "deviant" texts in corpora"

    This thread seems to be getting boring for some people, but just a comment.
    I would like to suggest that the weakening of the position of diacritics in
    written language, as seen in the relucatance to use them in email, at least
    in some languages, like Polish, comes also from the fact that they no longer
    reflect contemporary speech. The Polish 'nasal' vowels in fact no longer
    exist (indicated by diacritics on 'a' and 'e'), children have to be taught
    to use the relevant graphemes, as they pronounce other vowels or other
    phoneme sequences than indicated by the characters. This is of course a
    common problem but it shows where the wish to get rid of all diacritics
    originates.
    What is more interesting, I feel, is what we do with e-mail texts in corpus
    building. Should those diacritic-less texts be treated as deviant and,
    consequently, standardized/normalized, or should the lack of diacritics be
    treated as a distinctive feature of this particular type of text?
    Worse still, a news bulletin is sent by a Polish Press Agency (free) to all
    interested parties, and there was a long period when it was diacritic-less.
    The bulletin is nice, has lots of interesting words. Again: deviant before
    the insertion of the diacritics? Or an interesting feature of this text?
    And so on and so on.
    As for Unicode: Tony McEnery has shown that it does not cope satisfactorily
    with a number of languages of India (with non-Latin alpabets).
    Regards

    Tadeusz Piotrowski
    ***************************************************************
                                                  mailing address
    Department of English
    Opole University Chrobrego 20
    Oleska 48 PL-55-020 Zorawina (Zórawina)
    Opole
    POLAND
                  phone/fax (+48)71-3165847
                  mobile (+48)607159263



    This archive was generated by hypermail 2b29 : Sat Apr 21 2001 - 13:00:05 MET DST