Re: [Corpora-List] Looking for corpora suitable for research on language and gender...

From: Diana Maynard (d.maynard@dcs.shef.ac.uk)
Date: Thu Mar 25 2004 - 17:48:20 MET

  • Next message: Ellen Riloff: "[Corpora-List] CoNLL-04 Conference Program"

    hi Ute
    There is some gender information in the spoken texts of the BNC.

    There's a code sdesex1 and sdesex2 representing Spoken:Demographic: Respondent
    Sex (male =1, female =2)
    i don't remember any more without looking at the files, but you can find the
    info in the bncfinder.dat file I think

    I think there is probably only gender info for the demographic texts, ie where
    there is a single person speaking and/or responding

    Regards
    Diana Maynard

     

    On Thursday 25 Mar 2004 3:22 pm, Ute Römer wrote:
    > Dear All,
    >
    > I am in the process of preparing an introductory course on language and
    > gender and was thinking about compiling a "language and gender studies
    > corpus sampler" for my students so they can carry out some small-scale
    > empirical research projects to base their term papers on. For this sampler
    > it would be ideal to have spoken and/or written corpora with (roughly
    > comparable) male and female subsections, or just all-male/all-female
    > talk/writing corpora, or maybe even collections of exclusively gay and/or
    > lesbian language.
    >
    > I'm going to include a couple of small and specialised home-made corpora
    > (literary texts, book reviews, pop/rap song lyrics...), but would also like
    > to use larger and less specialised ones, such as COLT and (parts of) the
    > BNC. Does anyone know about a possibility to extract from these corpora
    > all-female and all-male conversations or male/female authored texts
    > (without having to read the headers of 4,000+ text files)? I had a look at
    > David Lee's "BNC Index" Excel spreadsheet but couldn't find sex indicators
    > for spoken texts (maybe most of them are mixed sex anyway). Also, I would
    > be grateful for pointers to other corpora which might be appropriate for
    > L&G-related research (MICASE online is already on my list; and I've
    > subdivided the transcript files of the Santa Barbara Corpus of Spoken
    > American English into male/female/mixed groups).
    >
    > Best wishes and thanks in advance... Ute
    >
    >
    > ************************************************************
    >
    > Ute Rmer
    > English Department
    > University of Hanover
    > Knigsworther Platz 1
    > 30167 Hannover
    > Germany
    >
    > Phone: +49 (0)511 762 2997
    > Fax: +49 (0)511 762 2996
    > E-mail: ute.roemer@anglistik.uni-hannover.de
    > http://www.fbls.uni-hannover.de/angli/



    This archive was generated by hypermail 2b29 : Thu Mar 25 2004 - 18:34:51 MET