Re: Corpora: a particular type of sloppiness

From: James L. Fidelholtz (jfidel@siu.buap.mx)
Date: Fri Apr 13 2001 - 19:43:55 MET DST

  • Next message: TALN: "Corpora: DEADLINE EXTENSION - Workshop on Machine Translation"

    On Wed, 11 Apr 2001, Alexandr Rosen wrote:
    >
    >> From: "Tadeusz Piotrowski" <tadpiotr@plusnet.pl>
    >[snip]
    >> I wonder what do the people do with other diacritic-rich languages? German?
    >> French? Czech? Is it the same as in Polish?

    Marco Antonio Esteves da Rocha <marcor@cce.ufsc.br> answered:
    << [snip]
    Curious idea. The absence of diacritics in Portuguese is what disturbs
    me, not their inclusion. It is difficult to be sure whether people on
    the other end of the message have the equipment and configuration to
    actually see those diacritics on screen in their e-mail editor. In fact,
    what appears on different screens around the world when you produce
    diacritics in your own equipment is quite unpredictable and may be
    unreadable for the recipient. So people writing messages in Portuguese
    often choose not to use them for safety.
    <<
    But it makes me feel very uncomfortable. It is not at all the feeling of
    using pidgin Portuguese but of writing in a different language,
    [snip]>>

    and Alexandr Rosen <rosen@chomsky.ruk.cuni.cz> comments:

    >I have always thought that the absence of diacritics in most Czech e-mails is
    >due to the writer's awareness of the danger of character codes becoming
    garbage on the way, rather than due to the writer being lazy. In fact,
    [snip]
    >I believe it is very unfortunate that we still don't have a reliable way of
    >using a Latin-based (or any other) writing system on the Internet, sloppily or
    >not.
    >

            Well, I live and often write in a Spanish-speaking country, and
    the answer to Tadeusz's question is that it varies tremendously. I
    would say the plurality, if not the majority, of people write without
    accents for the reasons adumbrated by Marco Antonio and Alexandr. Of
    the large number of messages I receive in Spanish from people who *do*
    use accents, a *very* large proportion of them are botched up with
    different kinds of codes, and this includes even messages from myself
    (via another server, of course) and the fact that my server is set up
    for reading the coding which includes Spanish. God knows what the
    reason is (I guess I ought to, but then we all have our little areas of
    inexplicable ignorance). ;)
            I am not a native speaker of Spanish, and have argued in
    published articles for the general elimination of accents and diacritics
    from Spanish (and would be brash enough to make the same argument for
    almost *any* language with diacritics, including Polish, Portuguese and
    Czech). My reasons are low functional load for the diacritics in
    general (messages I receive in Spanish without diacritics are close to
    100% legible, and very close indeed to the legibility of msgs with
    diacritics; I'd bet the same is true for Czech, and I know it is for
    Polish-- the ó [if that got butchered up, it's an 'o' with an acute
    accent over it], for example, is almost 100% predictable), also the
    general dropping of diacritics in handwriting, etc. However, because of
    the existence of the Royal Spanish Academy of the language and the
    general inertia of tradition, this suggestion has a close to zero
    probability of being accepted. So, being a hard-headed SOB, I demand
    (especially of myself) the proper use of accents in *all* written
    Spanish communications. Given the problems alluded to above for Polish
    and Czech, however, and which are equally valid for Spanish, one wishes
    to spare those who receive their communications from the systematic
    butcheries which accented letters are prone to undergo. So I always
    write the letter followed by the accent, which for Spanish is just one
    extra keystroke (again, I'm too lazy to train myself in how to adapt the
    keyboard). Although I am at least the equivalent of an educated native
    Spanish speaker in my use of accents (a major problem for writers of
    Spanish, by the way, because of some not-quite-predictable uses of
    accents, and a few problematical or arbitrary cases), even I have some
    lapses in my accentuation. The point here is that, except in a
    hypothetical language in which accents really carried a functional load,
    leaving them off will do very limited damage to the communication. I
    say 'hypothetical' because people *do* leave off accents at the drop of
    a hat, and this would in those cases impede communication,
    theoretically, and so such an orthography would tend to be rapidly
    modified. I strongly suspect that even the Portuguese example cited is
    exaggerated, and would have limited effect on actual communication.
            Jim

    -- 
    James L. Fidelholtz			e-mail: jfidel@siu.buap.mx
    Posgrado en Ciencias del Lenguaje	tel.: +(52-2)229-5500 x5705
    Instituto de Ciencias Sociales y Humanidades	fax: +(01-2) 229-5681
    Benemérita Universidad Autónoma de Puebla, MÉXICO
    



    This archive was generated by hypermail 2b29 : Fri Apr 13 2001 - 19:51:06 MET DST