Corpora: Re: Typing accents in Windows & "optionality" of diacritic marks

From: Trond Trosterud (trond.trosterud@hum.uit.no)
Date: Fri Apr 20 2001 - 12:46:51 MET DST

  • Next message: Steven Bird: "Re: Corpora: diacritic marks"

    >If you can tell me how to type them in Word under Windows on a US
    >English keyboard I would appreciate it.

    In most cases you do not need to go to the symbol window.

    On your control panel (from the MS menu) you will find an icon called
    keyboards. You go there, look for the keyboards you want, and select them.
    Then, in the bottom right corner comes a little lg abbreviation telling
    what keyboard you use. The same goes for the Mac (starting from the apple
    menu). I use 3 different keyboard layouts on my Mac (Norwegian, Finnish and
    Sámi), the US keyboard is available as well.

    There is one possible (but unlikely) obstackle: If Microsoft has decided
    that US OS users shall be protected from such temptations to look at other
    lgs (i.e. if this handy mechanism is not available to US OS's) I suggest
    you do something with it, start a campaign or whatever.

    It is a long-standing difference between PC and Mac that the former does
    not let you access the characters of your code table from other keyboard
    layouts easily, whereas the latter gives you the whole 8-bit code table.
    Thus, the Mac had a multilingual approach from the very beginning, as
    opposed to the monolingually designed PC. Even PC users can access accented
    letters without changing keyboard layout, though (at least my Norwegian PC
    keyboard layout gives me access to Spanish and other vowels (acute, grave,
    diaeresis, circumflex, tilde + vowel) via AltGr + the dead keys D12 and
    E12. I can only hope that collegues in Los Angeles are equally well
    provided, sitting there with their US keyboards).

    Since genuine 7-bit systems are really rare, what monolingual English users
    need to do to get access to all the Western Eurpoean lgs (save the Gaelic
    ones, there you need other measures) is to configure your e-mail system to
    the 8-bit code table ISO/IEC 8859-1, or Latin 1. Thus, Geoffry Sampsons
    otherwise fine defense of linguistic rights is headed by the rather sad
    message "X-Sun-Charset: US-ASCII", which translates to "English and
    Indonesian only" (the only two lgs on earth for which US-ASCII is enough,
    (and if you don't accept writing "role" for "rôle", you are left with
    Indonesian)). Well, if he can read my Latin 1, it is OK, of course

    Russian, Japanese, Eastern European etc. users may have problems with Latin
    1 (but with proper email client settings the text will come through). That
    is one reason why ISO/IEC 10646, or Unicode, is invented. And here, Win9x
    and above has the lead over Macintosh (os X will bridge the gap). In Win9x
    or abouve, you can read every letter of every lg. By going to the symbol
    window you will be able to insert the relevant characters, in a cumbersome
    way, but you will not be able to make keyboards for character collections
    that do not have an 8-bit MS codetable. Receive info, and not produce it,
    is the somewhat Orwellian style.

    Then an important note on the "optionality of diacritic marks". This is
    nonsense. The ring above my Norwegian a is just as optional as the bar
    across the English l. Thus, Norwegian "rane" and "rĺne" is as distinct as
    English "tie" amnd "lie". It is true, though, that you can read a Norwegian
    text without the Norwegian letters, just as you can read an English text by
    exchanging all i-s with y (tri it for iourselves). But we would rather not.

    I cannot but hope that a majority of my collegues will find this trivial.
    What has proven not to be trivial though, is efforts to standardise text
    encoding in corpora. Since this is a corpus list, where many obviously are
    stuck in ASCII, I strongly urge you to encode your corpora with Unicode.
    There you simply will find the letters you need, from Chuvash via IPA to
    African clicks and Cherokee. And your readers will be able to read your
    corpus as well.

    -------------------------------------------------------------------
    Trond Trosterud t +47 7764 4763
    Det humanistiske fakultet h +47 7767 3639
    N-9037 Universitetet i Tromsř, Noreg f +47 7764 4239
    Trond.Trosterud@hum.uit.no http://www.hum.uit.no/a/trond/
    -------------------------------------------------------------------



    This archive was generated by hypermail 2b29 : Fri Apr 20 2001 - 12:38:48 MET DST