Re: Corpora: a particular type of sloppiness

From: Monica Merino (mmerino@visto.com)
Date: Wed Apr 18 2001 - 15:35:42 MET DST

  • Next message: Steven Krauwer: "Corpora: MT Roadmap: deadline extension"

    As a native speaker of Spanish I can tell you that ALL Spanish speakers would
    face terrible comprehension problems without diacritics. In many cases,
    diacritics in Spanish are used to "distinguish" homonyms. Take for example
    these two cases:
    El niño *se* cayó (The boy feel down)
    *Sé* que será difícil entenderlo (I know it's going to be difficult to
    understand)
    In the first case we're talking about the the reflective form of the verb "to
    be" whereas in the second case we're talking about the first person singular
    conjugation of the verb "to know". Perhaps in isolated sentences like these two
    and in the the "relaxed" and rather artificial situation of "reading examples",
    these diacritics might not seem crucial for comprehension. But I can't imagine
    what it would be like to have a 5,000-word Spanish text with no diacritics!
    It would take ages for native speakers of a language with diacritics to get
    used to one without them! And anyway, what's the problem with diacritics?
    Monica Merino

    -----Original Message-----
    From: Alexandr Rosen Alexandr.Rosen@ff.cuni.cz
    Sent: Tue, 17 Apr 2001 14:15:28 +0200 (MET DST)
    To: corpora@hd.uib.no
    Subject: Re: Corpora: a particular type of sloppiness

    > I am not a native speaker of Spanish, and have argued in
    > published articles for the general elimination of accents and diacritics
    > from Spanish (and would be brash enough to make the same argument for
    > almost *any* language with diacritics, including Polish, Portuguese and
    > Czech). My reasons are low functional load for the diacritics in
    > general (messages I receive in Spanish without diacritics are close to
    > 100% legible, and very close indeed to the legibility of msgs with
    > diacritics; I'd bet the same is true for Czech, and I know it is for
    > Polish-- the ó [if that got butchered up, it's an 'o' with an acute
    > accent over it], for example, is almost 100% predictable), also the
    > general dropping of diacritics in handwriting, etc.

    I don't know about Spanish, but at least for Czech, I disagree. Writing Czech
    text without diacritics is just another way of butchering it up, although
    admittedly not that bad as if you let the servers do it.

    The Czechs always use diacritics, except when technology does not know better:
    telegraph, SMS, e-mail. Then the writer must pay special attention to prevent
    misunderstanding. And proper names often would not make sense unless
    transliterated.

    I think that by now we should have gotten past the stage where information
    technology forces us into something like that.

    Regards,

    Alexandr [Rosen]

    ___________________________________________________________________________
    Visit http://www.visto.com/info, your free web-based communications center.
    Visto.com. Life on the Dot.



    This archive was generated by hypermail 2b29 : Wed Apr 18 2001 - 15:31:54 MET DST