RE: Corpora: when does a subcorpus become a corpus

From: P bI K O B_ B.B. (rykov@narod.ru)
Date: Fri Jan 04 2002 - 14:05:44 MET

  • Next message: Sampo Nevalainen: "RE: Corpora: when does a subcorpus become a corpus"

      I would like to make a little example. There was a report here about distribution of the meanings of the verb "moch'". This Russian verb has two main meanings - "can" and "may".

      Would my distributions based on the corpora like Corpus of Russian Proverbs, Political metaphors or Russian newspapers have any value - or - in other words - tell us smth about Russian language as a whole? I think that the proof of it can give the texts of general carefully compiled balanced represantative corpus of Russian language.

    >Well I guess I tried to focus on the issue of representativeness rather
    >than the proper nomination for the set of texts, but, yes, probably the
    >proper term might be 'special purpose corpus'. This, however, raises
    >another interesting question. I personally would hope that every single
    >corpus had been compiled for a particular purpose. Indeed, I wonder if
    >there really IS such thing as a 'general corpus'? I have a feeling that so
    >called 'general corpora' - if they exist - are pretty useless in general,
    >unless they're modified for a particular purpose or task. I suppose that in
    >empirical research you always have to choose your "object" (material)
    >according to your subject, and not to use "just something", i.e. you have
    >to know your material: I guess no one would try to determine the average
    >height of human beings on the basis of a basketball team. The problem with
    >language is that exceptions are often not evident and not easily detected
    >since there is no clear "reference set" for language. In principle, if your
    >findings are truly generalizable you should get similar results from any
    >corpus, although there is obviously more "noise" in more "general" corpora.
    >Am I right? Or am I pedant? Or both. ( About the "Terms in Context" - which
    >I do have read more than up to p. 45 :-) -, I liked the book, and I think I
    >could make use of some chapters in my course on corpora as translation tools. )
    >
    >sincerely,
    >Sampo
    >
    >At 09:54 4.1.2002 +0100, Pearson, Jennifer wrote:
    >>If you look at the same publication, p.48, you will find that I argue that,
    >>given Sinclair's definitions, neither the term subcorpus nor the term
    >>component is appropriate for the sets of texts I was working with (and
    >>probably not for the EAP texts referred to in previous e-mails either). I
    >>chose therefore to use the term special purpose corpus, "a corpus whose
    >>composition is determined by the precise purpose for which it is to be used.
    >>While a special purpose corpus may be derived from a general reference
    >>corpus or from a monitor corpus it will not constitute a subcorpus in the
    >>sense defined by Sinclair because it will not have all of the properties of
    >>a larger corpus." I coined this particular term for two reasons, a) because
    >>the language of the texts I was working with could be classified as
    >>'language for special purposes' or 'LSP', two terms that already existed in
    >>applied linguistics to designate, for example, the language of business, the
    >>language of medicine, the language of economics, and b) because the term
    >>'special purpose corpus' implies that the corpus has been compiled for a
    >>particular purpose.
    >>Wishing you all a happy new year
    >>Jennifer
    >>
    >>Dr Jennifer Pearson
    >>Chief of Translation
    >>UNESCO
    >>7 Place de Fontenoy
    >>75352 Paris 07
    >>Tel:. 00 33 1 456 80 780
    >>e-mail: j.pearson@unesco.org
    >>http://www.unesco.org
    >
    >
    >
    >

    -- 
    Vladimir Rykov, PhD in Comp Linguistics, 
     MOSCOW
    http://rykov.narod.ru/
    Engl. http://www.blkbox.com/~gigawatt/rykov.html
    Tel +7-903-749-19-99
    



    This archive was generated by hypermail 2b29 : Fri Jan 04 2002 - 14:08:11 MET