Dear Eric and others,
some early (or earlier) pre-electronic corpus-based studies that come to my mind
are E.L. Thorndike's 1921 work on relative frequencies of words (if I remeber
correctly he used a corpus of more than 4 Mio. words!): Teacher's Wordbook. New
York: Columbia Teachers College, and (much earlier) A. Cruden's 1796 (!) Complete
Concordance to the Old and New Testaments. Worth mentioning also is the (later)
work by Michael West: 1953, A General Service List of English Words. London:
Longman. And then of course Otto Jespersen who also used corpus data to compile his
grammar from 1909-1949.
I can't help with the 'where-to-get-Markov's-paper-problem' but I can recommend the
site www.abebooks.com where I got my own Zipf and my own Firths!
Hope this helps!
Best wishes.... Ute
Eric Atwell schrieb:
> Ramesh said:
> > ... perhaps *the* earliest publication of linguistic research using an
> > electronic corpus was: ...
> ...but don't forget even earlier Corpus Linguistics research done
> without computers. For example modern Language Engineering researchers
> extract Zipf distributions and Markov models from corpora; this was
> done earlier "by hand" :
> Zipf, George Kingsley (1936) "The psycho-biology of language : an
> introduction to dynamic philology" London : G. Routledge & sons
> Markov, A.A. (1913) "Essai d'une recherche statistique sur le texte du
> roman 'Eugene Onegin' illustrant la liaison des epreuve en chain"
> Izvestia Imperatorskoi Akademii Nauk (Bulletin de l'Academie Imperiale
> des Sciences de St-Petersbourg) 7:153-162.
> Does anyone have an earlier citation???
> Eric Atwell
> PS Leeds library has Zipf book but I dont actually have a copy of Markov paper,
> I copied the citation from Jurafsky&Martin(2000) "Speech and Language
> Processing" Prentice Hall - can someone let me have a copy please PLEASE?
> Eric Atwell, Distributed Multimedia Systems MSc Tutor & SOCRATES Tutor
> School of Computing, University of Leeds, LEEDS LS2 9JT
> TEL: 0113-2335430 MOBILE: 0775-1039104 FAX: 0113-2335468
> WWW: http://www.comp.leeds.ac.uk/eric EMAIL: email@example.com
This archive was generated by hypermail 2b29 : Thu Apr 18 2002 - 15:11:53 MET DST