Re: Corpora: Legal corpus?

From: David Lee (david_lee00@hotmail.com)
Date: Sat Aug 26 2000 - 02:12:58 MET DST

  • Next message: Frances Rock: "Re: Corpora: Legal corpus?"

    [Hasn't someone asked this before, not long ago? Anyway..]

    Pulie,

    Assuming you're working with English, there are about 127,331 words (not
    a lot, by today's standards) in 13 files of courtroom proceedings
    (hearings/trials, including judges' summations) in the BNC, transcribed
    from spoken recordings. (I haven't come across any police interviews in
    the BNC.)

    However (and that's a big 'however'), it seems to me that some of the
    trials/court proceedings were were split between 2 or more recordings
    and thus landed up in different files. This means that the 13 files
    probably only represent around 7 different 'cases' (estimate: I've
    obviously not checked in detail). This may or may not be a problem,
    depending on your research.

    The other *huge* problem (for anyone wanting to do even the most basic
    sociolinguistic research) is the almost complete absence of information
    about the participants recorded (i.e. age, sex, social class, etc.). The
    most we get is whether they were male or female (and more than half the
    time, we don't even get that) and their role (judge, solictor, witness,
    plaintiff, defendant). (Plea to future corpus compilers: please
    scrupulously collect and record all the information you can get your
    hands on about your participants!)

    You might also want to look at the ICE-GB corpus:

    Legal cross-exams (dialogue) - 10 texts; 21,179 words
    Legal presentations (monologue) - 10 texts; 21,735 words
    Total: 42,914 words

    Thankfully, there is more information on the participants in (some of)
    the ICE-GB texts (less than half of them), but not by much. It would
    seem 'Unknown' or '---' is an acceptable value for sociolinguistic
    categories in many contemporary corpora... how sad. Confidentiality and
    difficulty in obtaining personal information from large numbers of
    strangers certainly constitute problems, but surely these are not
    insurmountable?

    Anyway, hope this helps.

    David Lee
    -----------------------------------------------------------------
    David YW Lee **************************************
    Dept of Linguistics * Stop the narrowing of minds *
    Lancaster University * Affirm the diversity of life *
    Lancaster LA1 4YT ***************************************
    England, UK.

    Email: david_lee00@hotmail.com
    -----------------------------------------------------------------



    This archive was generated by hypermail 2b29 : Sat Aug 26 2000 - 02:15:05 MET DST