Corpora: corpus of academic writing

David Coniam (coniam@cuhk.edu.hk)
Mon, 31 May 1999 16:16:48 +0800

My thanks to everyone who replied to my requests for info on corpora of academic writing. I got a number of useful responses; here's a short summary.

Paul Thomson (p.a.thompson@reading.ac.uk) is responsible for the Reading Academic Text (RAT) corpus - http://www.rdg.ac.uk/AcaDepts/cl/CALS/corpus.html
He is about to start on expanding the corpus, and is collecting transcripts of lectures. Another project involves collecting examples of undergrad and Masters level writing over the next few months.

Geoffrey Williams (Williams@ensinfo.univ-nantes.fr) pointed me to the ELRA (http://www.icp.grenet.fr/ELRA/home.html) , also to Chris Gledhill, at Sterling, who he says has a corpus of cancer research.

Ramesh Krishnmurthy of Cobuild (ramesh@clg.bham.ac.uk), told me that the academic subcorpus of the Bank of English was about 7m words (1m UK, 6m USA).

Helmut Gruber (
helmut@ling.univie.ac.at) is currently analysing a small corpus of Austrian students' academic writings (seminar papers) in German.

Jason Eisner (
jeisner@unagi.cis.upenn.edu) pointed me to the paper archives at http://www.xxx.lanl.gov, and to various online journals


David Coniam
Chinese University of Hong Kong