Re: Corpora: a particular type of sloppiness

From: Alexandr Rosen (rosen@chomsky.ruk.cuni.cz)
Date: Wed Apr 11 2001 - 15:40:32 MET DST

  • Next message: Priscilla Rasmussen: "Corpora: DEADLINE EXTENSION: ACL-2001 Workshop on Evaluation Methodologies for Language & Dialogue Systems"

    > From: "Tadeusz Piotrowski" <tadpiotr@plusnet.pl>

    > But in fact I wanted to report on an interesting type of sloppiness in a
    > language with diacritics. Polish has nine diacritics, or eighteen, when
    > capital letters are counted separately. The point is that very few people
    > bother about diacritics in e-mails, they use what is sometimes called pidgin
    > Polish: only the Latin (or English) characters are used. (You have to press
    > two keys at the same time when you want to use diacritics, you press one
    > when you do not. Economy of language...).
    > A very (VERY) careful writer will use diacritics, or you can tell somebody
    > was writing offline seeing diacritics in his/her mail. In fact, we have a
    > nice gradation: a proper letter with diacritics, a proper letter without
    > diacritics, a casual letter, etc. This device tells you a lot about the
    > speaker(?)/writer.
    > I wonder what do the people do with other diacritic-rich languages? German?
    > French? Czech? Is it the same as in Polish?

    I have always thought that the absence of diacritics in most Czech e-mails is
    due to the writer's awareness of the danger of character codes becoming garbage
    on the way, rather than due to the writer being lazy. In fact, in most cases (11
    out of 15) you only need one keystroke to produce an accented lower-case
    character on the standard Czech keyboard. A decent keyboard mapping table (not
    the default one in Czech MS Windows) with Caps Lock on also produces accented
    upper-case characters with a single keystroke.

    I believe it is very unfortunate that we still don't have a reliable way of
    using a Latin-based (or any other) writing system on the Internet, sloppily or
    not.

    Regards

    Alexandr Rosen

    Institute of Theoretical and Computational Linguistics
    Faculty of Philosophy, Charles University, Prague

    address: UTKL FF UK, Celetna 13, CZ 110 00 Praha 1, Czech Republic
    tel.: +420-2-24491858, e-mail: alexandr.rosen@ff.cuni.cz
    http://utkl.ff.cuni.cz/~rosen/



    This archive was generated by hypermail 2b29 : Thu Apr 12 2001 - 23:50:43 MET DST