Summary: Child language corpora

Nick Smith (nick@comp.lancs.ac.uk)
Tue, 5 Mar 1996 01:20:53 +0000 (GMT)

A couple of weeks back I asked for help in obtaining corpora of
children's language.

After a bit of a delay, I'd like to thank all those who responded -
Alan Morrison, Marie E. Helt, Gerry Knowles, Eric Atwell, Michael
Schillo, Adam Kilgariff, Geoffrey Sampson, Sean ? -
and post the summary.

Nick Smith.

------------

(1) Polytechnic of Wales Corpus (POW)

"100,000 words of transcribed British English child-language data
sampled from 6 - 12 year olds. collected between 1978-84. The corpus is
balanced for sex, age, socio-economic status and strong second language
influence."

Recommended contacts are Robin Fawcett (fawcett@clu.abcy.cf.ac.uk), the
original compiler and analyzer of the POW Corpus, Tim O'Donoghue
(tim@canon.co.uk), Clive Souter (cs@scs.leeds.ac.uk) who wrote the
manual, and Eric Atwell (eric@scs.leeds.ac.uk).

Available from:

- Oxford Text Archive
Email Alan Morrison, archive@vax.ox.ac.uk

- or ICAME
http://www.hd.uib.no/corpora.html

(2) CHILDES

"A collection of utterances of children of different age groups. The
total size of the database is approximately 150 megabytes. The corpora
are divided into six major directories:
English data, non-English data, story-telling or narrative data, data
on language impairments, data from second language acquisition, and
data not transcribed." [MacWhinney, 1995]

MacWhinney, Brian, "The CHILDES project", Lawrence Erlbaum Associates,
pp.280, 1995.

See http://poppy.psy.cmu.edu/childes/

or: ftp://atila-ftp.uia.ac.be/pub/www/childes/

(latter recommended for those in Europe)

(C) Written Language collections

The following books were recommended to me by Geoffrey Sampson:

The Written Language of Nine and Ten-Year Old Children
The Written Language of Eleven and Twelve-Year Old Children

Nuffield Foreign Languages Teaching Materials Project,
Reports and Occasional Papers no. 24 [25, respectively]

He adds : "Apart from a few pages of introduction including transcription
conventions they consist entirely of writing by children faithfully
reproduced, with crossings-out, misspellings, etc. all recorded. I
estimate that the total amount of children's writing is about 100,000
words."

Published in 1967 at: 5 Lyddon Terrace, The University, Leeds 2;
Editors' names are not given.

Although they are not machine-readable and probably not very widely
available, they do concentrate on written language, which was the
main focus of my search. As far as I know, neither Childes nor POW
contains much writing by children (- is this correct?).

- Nick.