[Corpora-List] Russian Corpora at Russian Congress

From: P bI K O B___ B.B. (MOCKBA) (rykov@narod.ru)
Date: Mon Mar 22 2004 - 13:02:54 MET

  • Next message: Cédrick Fairon: "[Corpora-List] JADT 2004 proceedings"

       There was 2nd Russian Language congress last week

        http://www.philol.msu.ru/~rlc2004/en/inflet/index.php

        Soon all the reports will be on the site.

        I include the English annotation of mine.

        My main idea is the following. When I investigated the Brown University Corpus - I could see - what Americans REALLY read.

       Still here and there people try to select for there corpora (if they want them to represent the real speech activity in society) the best text samples. Actually they replace the real state of human communication pattern with their imagination the way it should be.

    --------------

       
    CORPUS OF TEXTS – A NEW TYPE OF WORD UNITY

    Rykov V.V. rykov2000@mail.ru

    Key words: text corpus, corpus linguistics, general philology, speech medium, speech texture, writing tools, representativeness.

    Now a “text corpus” or simply “corpus” is a frequently used term. Very often corpora are sources of many kinds of empirical and theoretical research. Nevertheless some important properties of corpora have to be properly defined. The fact is that many people use this word in various and different ways. This leads to the wrong corpora usage and hence misinterpretation of research results.The purpose of this paper is to specify the meaning of the term “text corpus” and so to make clear the nature of the text corpus itself as a special kind of word unity. The standard definition contains four properties or qualities – machine readable form, sampling and representativeness, finite size and standard reference. This paper discusses all these features using modern philological paradigms paying special attention to sampling and representativeness.
    Sampling procedures following so called corpus design criteria should representatively reflect in the corpus texts the philological phenomena that was the purpose of the initial corpus design and later sampling. This is the central point of corpus definition under discussion.

    -- 
    

    Regards Vladimir Rykov

    PhD in Computational Linguistics

    Personal web-site: rykov.narod.ru English version: www.blkbox.com/~gigawatt/rykov.html

    -- 27 ÍÁÒÔÁ - ïÔËÒÙÔÉÅ æÅÓÔÉ×ÁÌÑ "úÏÌÏÔÁÑ íÁÓËÁ" × ãÉÒËÅ ÎÁ ã×ÅÔÎÏÍ ÂÕÌØ×ÁÒÅ.âÉÌÅÔÙ ÐÏ ÔÅÌ.: 755-8335. http://goldenbilet.ru



    This archive was generated by hypermail 2b29 : Mon Mar 22 2004 - 13:17:41 MET