Re: [Corpora-List] Corpus Sanitation

From: FIDELHOLTZ DOOCHIN JAMES LAWRENCE (jfidel@siu.buap.mx)
Date: Fri Nov 29 2002 - 02:49:00 MET

  • Next message: Geoffrey Sampson: "[Corpora-List] Corpus Sanitation"

    Hi all,
            I absolutely second Tony's post. In fact, I have issues in
    principle with anonymization, as this will obviously affect phonological
    aspects of the corpus, due to the very anonymization. Likewise, it will
    tend to skew proper nouns, as these are the ones anonymized, generally,
    and these are some issues which interest me particularly. I know that
    people have addressed the general issue, and the ethical questions are
    real, but there must be some way around this problem.
                    Jim

    On Wed, 27 Nov 2002, Mcenery, Tony wrote:

    >Dear All,
    >
    >I was interested to read in the recent posting to the list by Zhiping Zheng
    >(see below) that he was uncertain as to whether he should make his corpus
    >publicly available because it contained some 'uncensored words' (Zhiping's
    >point 2). I guess that this means 'bad language' (I assume it does not relate
    >to anonymization issues as they are covered in Zhiping's point 1).If this is
    >about 'bleeping out' words in corpora, shouldn't we encourage Zhiping not to do
    >this? Surely we want corpora to contain uncensored speech? The point, for me,
    >of using corpora is to describe/account for language as it is, rather than
    >language as we wish it to be.
    >
    ...

    Blues great and cognitive scientist Robert Johnson on the mind/brain:
    "If ever I gotta bust your brains out, baby,
    Hoooo, It'll make you lose your mind."

    James L. Fidelholtz e-mail: jfidel@siu.buap.mx
    Posgrado en Ciencias del Lenguaje tel.: +(52-2)229-5500 x5705
    Instituto de Ciencias Sociales y Humanidades fax: +(01-2) 229-5681
    Benemérita Universidad Autónoma de Puebla, MÉXICO



    This archive was generated by hypermail 2b29 : Fri Nov 29 2002 - 02:59:35 MET