[Corpora-List] corpora and new language classifications

From: Yuri Tambovtsev (yutamb@mail.cis.ru)
Date: Sat Jul 12 2003 - 13:00:31 MET DST

  • Next message: Uri Horesh: "[Corpora-List] New Book: Corpus Linguistics and Modern Hebrew"

    Corpora and New language classification of Uralic languages.
    The main problem in constructing corpora is the problem of classification of this
    or that sort. Actually, the problem of classification may be called the aim of
    linguistics in general. A linguist must classify sounds, phonemes, words,
    sentences, meanings, etc., etc. Nevertheless, the most important problem in
    linguistics may be classification of 6000 world languages and dialects into
    subgroups, groups, families, super-families, filia, etc. However, the main
    language families were constructed long ago and some of them need
    reconstructing. I'm sure it is one of the hardest jobs in linguistics to reconsider
    accepted classifications for many reasons. I heard that such an attempt of this
    hard and dangerous job has been made by Dr. Angela Marcantonio of Rome
    university, who tried to reconsider the Uralic language family in her recent book
    (The Uralic Language Family. Facts, Myths and Statistics.- Oxford UK and
    Boston USA: Blackwell Publishers, 2002, 335 pages). I wish I could read it, but
    it is not available in Novosibirsk, Russia. The Uralic language family is said to
    consist of the Finno-Ugric and Samoyedic languages. I can guess that the Uralic
    language family may be not a real family, but a conglomerate of Finnic, Ugric
    and Samoyedic languages. My phonostatistical data on this language group
    makes me believe that one should be very cautious when talking about the Uralic
    languages as one family. Consequenntly, the values of the coefficient of variation
    of 8 consonantal groups (labial, front, palatal, velar, sonorant, occlusive, fricative
    and voiced) SHOW THAT ITS BODY IS RATHER DISPERCE, i.e. not
    compact. The fact is, that this group is less compact than other language families.
    Let us compare the coefficients of variance of several language families:
    Uralic - 28.31%
    Mongolic - 10.78%
    Samoyed - 18.29%
    Turkic - 18.77%
    Finno-Ugric - 24.14%
    Altaic - 25.97
    Therefore, one can see that the Uralic group of languages is not as compact as
    Finno-Ugric or Samoyedic, which are its part. It is 2 times less compact than
    Mongolic language family. One can find the details of the compactness of other
    language groups in my recent book (Yuri A. Tambovtsev. The Typology of
    Functioning of Phonemes in the Sound Chain of Indo-European, Paleo-Asiatic,
    Ural-Altaic, and Other World Languages: the compactness of Groups, Families
    and the other Language Taxons. - Novosibirsk: SN Institute, 2003. - 143 pages. In Russian).
    I wonder if I may ask my colleagues in the field of linguistics to share their
    opinion on the book of Dr. Angela Marcantonio. Should we reconsider the
    commonly accepted language families? If so, on the basis of what data and what
    methods? Looking forward to hearing from you soon to yutamb@hotmail.com Yours
    sincerely Yuri Tambovtsev, Novosibirsk, Russia



    This archive was generated by hypermail 2b29 : Sat Jul 12 2003 - 12:50:15 MET DST