Re: Corpora: Historical background of Corpus Linguistics

From: Ute Römer (
Date: Thu Apr 18 2002 - 15:13:11 MET DST

  • Next message: Afsaneh Fazly: "Corpora: Resources on Persian Language"

    Dear Eric and others,

    some early (or earlier) pre-electronic corpus-based studies that come to my mind
    are E.L. Thorndike's 1921 work on relative frequencies of words (if I remeber
    correctly he used a corpus of more than 4 Mio. words!): Teacher's Wordbook. New
    York: Columbia Teachers College, and (much earlier) A. Cruden's 1796 (!) Complete
    Concordance to the Old and New Testaments. Worth mentioning also is the (later)
    work by Michael West: 1953, A General Service List of English Words. London:
    Longman. And then of course Otto Jespersen who also used corpus data to compile his
    grammar from 1909-1949.

    I can't help with the 'where-to-get-Markov's-paper-problem' but I can recommend the
    site where I got my own Zipf and my own Firths!

    Hope this helps!

    Best wishes.... Ute

    Eric Atwell schrieb:

    > Ramesh said:
    > > ... perhaps *the* earliest publication of linguistic research using an
    > > electronic corpus was: ...
    > ...but don't forget even earlier Corpus Linguistics research done
    > without computers. For example modern Language Engineering researchers
    > extract Zipf distributions and Markov models from corpora; this was
    > done earlier "by hand" :
    > Zipf, George Kingsley (1936) "The psycho-biology of language : an
    > introduction to dynamic philology" London : G. Routledge & sons
    > Markov, A.A. (1913) "Essai d'une recherche statistique sur le texte du
    > roman 'Eugene Onegin' illustrant la liaison des epreuve en chain"
    > Izvestia Imperatorskoi Akademii Nauk (Bulletin de l'Academie Imperiale
    > des Sciences de St-Petersbourg) 7:153-162.
    > Does anyone have an earlier citation???
    > Eric Atwell
    > PS Leeds library has Zipf book but I dont actually have a copy of Markov paper,
    > I copied the citation from Jurafsky&Martin(2000) "Speech and Language
    > Processing" Prentice Hall - can someone let me have a copy please PLEASE?
    > --
    > Eric Atwell, Distributed Multimedia Systems MSc Tutor & SOCRATES Tutor
    > School of Computing, University of Leeds, LEEDS LS2 9JT
    > TEL: 0113-2335430 MOBILE: 0775-1039104 FAX: 0113-2335468
    > WWW: EMAIL:

    This archive was generated by hypermail 2b29 : Thu Apr 18 2002 - 15:11:53 MET DST