Re: Corpora: Corpora of scientific texts

GCW (williams@ensinfo.univ-nantes.fr)
Wed, 21 Oct 1998 12:07:31 +0200 (MET DST)

On Wed, 21 Oct 1998, Chris Allen wrote:

> I was wondering whether anyone in this list has any information about the
> creation of a corpus of scholarly scientific articles written in English. I
> am well aware that the Cobuild Bank of English includes a science subcorpus
> but this is drawn from the New Scientist magazine. The texts are what might
> loosely be termed 'popular science'.

The problem with specialised corpora is one of copyright. My own corpus
'BIVEG' of plant biology (described in Intl.Jnl.Corpus Linguistics
3/1:151-171) is a fulltext corpus. Getting permissions was a nightmare as
there are 154 research articles from a number of journals. To date I have
only restricted permissions. However, you might like to contact Chris
Gledhill (cjg6@st-and.ac.uk) at St Andrews as I believe that the weight of
Aston University
in copyright negotiations allows him to offer the corpus more widely. His
PSC corpus on cancer is full text and research not popular science. I feel
that full text is vital as abstracts are a very particular beast with
limited lexico-grammatical choices not representative of the full RA. They
are however easier to obtain.

I would be interested to hear from anyone else working on full text RA
corpora to compare findings.

Geoffrey WILLIAMS
Colex
Faculte des Sciences et des Techniques
2 rue de la Houssiniere
BP 92208
44322 nANTES Cedex 3
France

williams@ensinfo.univ-nantes.fr

>
> What I am after is a corpus of journal papers in the physical, chemical,
> biological or medical sciences. I'd be most grateful to hear from anyone
> with information about such a corpus.
>
> Best wishes,
>
> Chris Allen
> University of Halmstad
> Sweden
> Chris Allen
>
> University of Halmstad Sweden
> direct tel. +46 35 167372 (office)
> +46 35 51527 (home)
> fax +46 35 129289
> http://www.hh.se
>
>
>