Re: Corpora: a particular type of sloppiness

From: Bruce Lambert (lambertb@uic.edu)
Date: Fri Apr 20 2001 - 00:06:34 MET DST

  • Next message: Grannyg493@aol.com: "Corpora: Western Armenian corpus?"

    Isn't this question of diacritics, at some level at least, an empirical
    one? That is, how does frequency of diacritic use vary in formal (e.g.,
    Spanish newspaper text) vs. informal (e.g., email) text? I know there are
    lots of subtleties that would need to be worked out to make any such
    comparison valid, but it would be interesting nonetheless to see how common
    or uncommon the use of diacritics is in various languages that use them.

    I'm a pretty strong believer in context as a disambiguator, and human
    beings are amazingly talented at correctly going beyond the information
    given. So my hunch is that a great deal of text without diacritics can
    still be unambiguously understood by the majority of readers. In fact, if
    Spanish or Czech (or whatever language that uses diacritics) email messages
    are often sent without diacritics, then I take this as an existence proof
    that, to some extent, they are not needed for satisfactory comprehension.

    -bruce

    At 11:50 AM 4/19/01 -0700, Rene.Valdes@lhsl.com wrote:

    >In support of Monika's argument, I'll offer the following two sentences:
    >
    > Ya termino. (I'm finishing soon.)
    > Ya terminó. (It's already finished.)
    >
    >Without the diacritic, you would not be able to tell which one of these two
    >meanings to assign to this sentence. I use diacritics whenever possible,
    >even at the risk of having my text become garbage when it travels through
    >cyberspace.
    >
    >Another interesting case is the very important distinction between año and
    >ano, two nouns with quite different meanings.
    >
    >René Valdés
    >San Diego, California
    >USA
    >
    >Monika Merino wrote:
    > As a native speaker of Spanish I can tell you that ALL Spanish speakers
    > would
    > face terrible comprehension problems without diacritics. In many cases,
    > diacritics in Spanish are used to "distinguish" homonyms. Take for
    > example
    > these two cases:
    > El niño *se* cayó (The boy feel down)
    > *Sé* que será difícil entenderlo (I know it's going to be difficult to
    > understand)
    > In the first case we're talking about the the reflective form of the
    > verb "to
    > be" whereas in the second case we're talking about the first person
    > singular
    > conjugation of the verb "to know". Perhaps in isolated sentences like
    > these two
    > and in the the "relaxed" and rather artificial situation of "reading
    > examples",
    > these diacritics might not seem crucial for comprehension. But I can't
    > imagine
    > what it would be like to have a 5,000-word Spanish text with no
    > diacritics!
    > It would take ages for native speakers of a language with diacritics to
    > get
    > used to one without them! And anyway, what's the problem with
    > diacritics?
    > Monica Merino



    This archive was generated by hypermail 2b29 : Fri Apr 20 2001 - 00:02:36 MET DST