Corpora: Corpus of scientific texts

DL (d.lee@lancaster.ac.uk)
Fri, 23 Oct 1998 03:00:55 +0100 (BST)

Recently, Chris Allen wrote:

> What I am after is a corpus of journal papers in the physical, chemical,
> biological or medical sciences.

And Alejandro Curado Fuentes wrote:

> I, like Chris Allen, am interested in a scientific-technical
> corpus, but, in my case, I'm looking for excerpts, chapters or parts from
> textbooks dealing with computer science, telecommunications, and
> information science.

I wonder why nobody has yet suggested looking into the British National
Corpus (BNC)... it may well contain enough of such material for your
purposes.

For example, there is lot of material (classified as 'applied science')
from:

Journal of Gastroenterology and Hepatology (approx. 713,164 words in total)
The Lancet (135,850 words)
Computergram International (479,705 words)
New Scientist (864,701 words)

Under 'pure/natural science', there are the following:

British Medical Journal (Why is this not 'applied science', as with The
Lancet? I don't know) (449,961 words)
Nucleic Acids Research (92,055 words)
Nature (352,364 words)
Chemistry in Britain (192,593)

(Caveat: not all of the above are necessarily academic journals... some
look like 'popular' science magazines)

If you're doing research in an EU country, you are entitled to work
with the BNC (I'm not sure what the current policy for the 'rest of the
world' is... Lou Burnard?).

Hope this helps.

David Lee.

-------------------------------------------------------------------------
David Lee *****************************************
Dept of Linguistics * *
Lancaster University * "Modern society without organized *
Lancaster LA1 4YT * religion would be like a crazed *
England, UK. * maniac without a chainsaw." *
* *
Email: D.Lee@lancaster.ac.uk *****************************************
-------------------------------------------------------------------------