Corpora: Arabic vs Spanish diacritics

From: Tim Buckwalter (TimBuckwalter@aol.com)
Date: Mon Apr 23 2001 - 19:41:34 MET DST

  • Next message: Bouma G.: "Corpora: Graduate student position: Implementation and machine learning of OT"

    Arabic short vowels and diacritics are zero-width optional elements that
    are used occasionally to disambiguate homographs when there is
    insufficient context for the reader to do so. A good writer anticipates
    these potential ambiguities and inserts shorts vowels and diacritics as
    needed, such as to disambiguate the Arabic for "Amman" and "Oman" or to
    indicate the passive voice. Occasionally one hears professional news
    announcers pause and backtrack to re-read a passage with a different
    "vocalization" of a word. Short vowels and diacritics are useful for
    learners, but once you know the language they are more of a hindrance.
    Arabs restrict their use mainly to poetry and religious texts.

    The big difference between Arabic and accented languages such as Spanish
    in this regard is that accent-less Spanish is probably sub-standard or
    at least informal orthography. Whereas it is the norm for an entire
    formal Arabic newspaper to have only a dozen or so thoughtfully-placed
    short vowels & diacritics, an unaccented Spanish newspaper would be hard
    to imagine (I've never seen one, at least), or one with accents placed
    only where there is not enough context to know what is intended.

    My impression is that the Arabs will make less and less use of these
    short vowels and diacritics in the future, possibly even dropping them
    entirely (as the Israelis have done with modern Hebrew). In our
    discussions with cell phone manufacturers I have noted the general
    expectation that text input on mobile devices will neither display nor
    provide an input method for Arabic short vowels and diacritics.

    Tim Buckwalter
    Senior Language Engineer
    AOL Mobile (formerly Tegic)
    1000 Dexter Ave N, Suite 300
    Seattle, WA 98109-3574
    206.268.7552 phone
    206.343.7004 fax
    206.343.7001 front desk
    TimBuckwalter@aol.com
    www.tegic.com

    Steven Krauwer wrote:
    >
    > Rene.Valdes@lhsl.com wrote:
    > >
    > > In support of Monika's argument, I'll offer the following two sentences:
    > >
    > > Ya termino. (I'm finishing soon.)
    > > Ya terminó. (It's already finished.)
    > >
    > > Without the diacritic, you would not be able to tell which one of these two
    > > meanings to assign to this sentence. I use diacritics whenever possible,
    > > even at the risk of having my text become garbage when it travels through
    > > cyberspace.
    > >
    > > Another interesting case is the very important distinction between año and
    > > ano, two nouns with quite different meanings.
    >
    > You're the experts, so I won't even dream of challenging what
    > you are saying, but I am really curious to hear the opinion of
    > colleagues from the Arabic speaking world, as they seem to be
    > able to live happily with unvocalized written texts.
    >
    > Should I infer that Spanish is more ambiguous in this respect,
    > or that Arabic speakers (or rather: readers) are more tolerant,
    > or that Spanish diacritics and Arabic vowels are different
    > animals?
    >
    > Steven



    This archive was generated by hypermail 2b29 : Tue Apr 24 2001 - 20:20:16 MET DST