Re: Corpora: a particular type of sloppiness

From: Rene.Valdes@lhsl.com
Date: Fri Apr 20 2001 - 02:20:23 MET DST

  • Next message: Vladimir Nuri: "Corpora: outlines with LSA/LSI"

    Why should we define informal text as email? Email is text written with a
    keyword that
    often lacks the keys needed for convenient input of diacritics. Email is
    text written with
    the purpose of sending your message across multiple systems that are very
    likely
    to convert your characters with diacritics into garbage. Isn't a quick
    handwritten note
    informal text? A Spanish speaker would never leave the diacritics out of
    such a note.
    If using a keyboard (like the ones in old typewriters) with which it is
    easy to enter the
    diacritics, no Spanish speaker would ever consider leaving them out. I
    assume this is
    also the case for German, Polish, Czech, Hungarian, Portuguese, and any
    other
    language using such marks.

    The problem is one of hegemony of a keyboard and a system designed with
    English
    in mind and the desire by some English speakers to impose on the users of
    other
    languages a need for disambiguation that is alien to them and would not
    even be in
    question if we were just to implement the appropriate means to input
    diacritics and
    transmit them across the internet.

    -René (with an accent, otherwise it would be pronounced differently and
    have indeed
                   a different meaning with no chance for disambiguation)

    Bruce Lambert <lambertb@uic.edu> on 04/19/2001 03:06:34 PM

    To: Rene.Valdes@lhsl.com, corpora@hd.uib.no
    cc:
    Fax to:
    Subject: Re: Corpora: a particular type of sloppiness

    Isn't this question of diacritics, at some level at least, an empirical
    one? That is, how does frequency of diacritic use vary in formal (e.g.,
    Spanish newspaper text) vs. informal (e.g., email) text? I know there are
    lots of subtleties that would need to be worked out to make any such
    comparison valid, but it would be interesting nonetheless to see how common
    or uncommon the use of diacritics is in various languages that use them.

    I'm a pretty strong believer in context as a disambiguator, and human
    beings are amazingly talented at correctly going beyond the information
    given. So my hunch is that a great deal of text without diacritics can
    still be unambiguously understood by the majority of readers. In fact, if
    Spanish or Czech (or whatever language that uses diacritics) email messages
    are often sent without diacritics, then I take this as an existence proof
    that, to some extent, they are not needed for satisfactory comprehension.

    -bruce

    At 11:50 AM 4/19/01 -0700, Rene.Valdes@lhsl.com wrote:

    >In support of Monika's argument, I'll offer the following two sentences:
    >
    > Ya termino. (I'm finishing soon.)
    > Ya terminó. (It's already finished.)
    >
    >Without the diacritic, you would not be able to tell which one of these
    two
    >meanings to assign to this sentence. I use diacritics whenever possible,
    >even at the risk of having my text become garbage when it travels through
    >cyberspace.
    >
    >Another interesting case is the very important distinction between año and
    >ano, two nouns with quite different meanings.
    >
    >René Valdés
    >San Diego, California
    >USA
    >
    >Monika Merino wrote:
    > As a native speaker of Spanish I can tell you that ALL Spanish
    speakers
    > would
    > face terrible comprehension problems without diacritics. In many
    cases,
    > diacritics in Spanish are used to "distinguish" homonyms. Take for
    > example
    > these two cases:
    > El niño *se* cayó (The boy feel down)
    > *Sé* que será difícil entenderlo (I know it's going to be difficult to
    > understand)
    > In the first case we're talking about the the reflective form of the
    > verb "to
    > be" whereas in the second case we're talking about the first person
    > singular
    > conjugation of the verb "to know". Perhaps in isolated sentences like
    > these two
    > and in the the "relaxed" and rather artificial situation of "reading
    > examples",
    > these diacritics might not seem crucial for comprehension. But I can't
    > imagine
    > what it would be like to have a 5,000-word Spanish text with no
    > diacritics!
    > It would take ages for native speakers of a language with diacritics
    to
    > get
    > used to one without them! And anyway, what's the problem with
    > diacritics?
    > Monica Merino



    This archive was generated by hypermail 2b29 : Fri Apr 20 2001 - 02:20:07 MET DST