Re: Corpora: Corpora of scientific texts

Einat Amitay (einat@mpce.mq.edu.au)
Wed, 21 Oct 1998 17:20:31 +1000

Hi,

The SMART software and test collections at Cornell have the following small
corpora - I think you can try them (although they are only abstracts and you
might want something bigger).

Cranfield collection. 1398 abstracts (numbered 1 through 1400). Aerodynamics:
ftp://ftp.cs.cornell.edu/pub/smart/cran/

Medlars collection. 1033 abstracts (numbered 1 through 1033). Medical:
ftp://ftp.cs.cornell.edu/pub/smart/med/

Time magazine collection. World news articles from 1963 Time magazine.
ftp://ftp.cs.cornell.edu/pub/smart/time/

Chris Allen wrote:

> I was wondering whether anyone in this list has any information about the
> creation of a corpus of scholarly scientific articles written in English. I
> am well aware that the Cobuild Bank of English includes a science subcorpus
> but this is drawn from the New Scientist magazine. The texts are what might
> loosely be termed 'popular science'.
>
> What I am after is a corpus of journal papers in the physical, chemical,
> biological or medical sciences. I'd be most grateful to hear from anyone
> with information about such a corpus.
>
> Best wishes,
>
> Chris Allen
> University of Halmstad
> Sweden
> Chris Allen
>
> University of Halmstad Sweden
> direct tel. +46 35 167372 (office)
> +46 35 51527 (home)
> fax +46 35 129289
> http://www.hh.se

--
Einat Amitay
einat@mri.mq.edu.au
http://www.mri.mq.edu.au/~einat