Corpora: Diacritics

From: Rich Foley (rfoley@levi.urova.fi)
Date: Fri Apr 20 2001 - 14:45:53 MET DST

  • Next message: Doug Cooper: "Corpora: Enuff already!"

    I was pleasantly surprised to see the discussion on diacritics revived.
    Herewith a few comments:

    * It is sad to think there are linguists who are intent on eradicating
    diacritics as so many flyspecks from various orthographies however
    motivated this may be by myopic software that can't deal with anything
    beyond ASCII 127.

    * I have always had a soft spot for ornate alphabets and elegant
    syllabaries, such as Georgian and Thai, and unabashedly
    extend this aesthetic to the lowliest diacritic. The credit here goes
    to the Jesuits for their strict training in Greek breathings, iota
    subscripts and accents in my youth.

    * Aversions to diacritics seem ominously connected with a lack of
    the same in English. It would be therapeutic if the linguistic
    (imperialist) tables were turned and speakers of Hebrew or Arabic began
    wondering out loud why we clutter English texts with all those vowels.
    Then
    again, I suspect the uniform structure of roots in those languages would
    make
    such a recommendation as relativistic as the Anglocentric admonitions at

    issue on the list. My feeling is that we'll be ready for orthographic
    hygiene
    across languages as soon as the dispute over variant Klingon
    orthographies has been resolved (http://www.kli.org/tlh/sounds.html) -
    by the Klingons.

    * For some interesting political or linguistic reason, German legal
    texts in the EU (EUR-LEX database) use the digraphs ae, ue and oe
    instead of a-, u- and o-umlaut. This is surprising inasmuch as umlauts
    are reasonably "establishment" as diacritics go and none of the other
    official languages seem to have adopted a comparable practice. The only
    other forum where I have seen this convention is Eurosport, where one
    sees Finnish surnames like Hämäläinen or Määttä rendered as
    Haemaelaeinen, Maeaettae.

    * Arguing that natives sometimes leave out diacritics and that the
    latter are therefore probably dispensable strikes me as tantamount to
    studying telegrams in English and concluding that the language could get

    by without articles and prepositions.

    * With EU enlargement to include the Czech Republic, Estonia, Hungary,
    Poland and Slovenia, the minds behind (and in front of) computers in
    general and email programs in particular had better quit while they are
    behind and learn to deal with diacritics.

    * I once wrote a little program on a computer course that would take a
    Finnish text and replace the double (i.e. long) consonants and vowels
    with the corresponding single character and an acute accent (e.g.,
    kaataa 'to pour' -> kátá) á la Hungarian vowel orthography. (Not
    surprisingly, I had to design a new character set to get consonants with

    the appropriate diacritic). Comparisons of input and output texts
    revealed that such a reform would cut paper consumption by 10-15%.

    Rich Foley
    University of Lapland



    This archive was generated by hypermail 2b29 : Fri Apr 20 2001 - 14:34:29 MET DST