sum: names in corpora

Kristine Hasund (Kristine.Hasund@hia.no)
Wed, 07 May 1997 11:00:27 +0200

A while ago, I posted a message on Linguist and Corpora requesting=
information
about the different conventions that are used to protect the identity
of informants in spoken corpora.

I wish to thank the following people who kindly replied:

Gerald Nelson
Antoinette Renouf
Bernadette Vine
Svenja Sachweh
Susan Meredith Burt
Christine Cheepen
Marco Antonio Da Rocha
Bill Fisher
Dan I. Slobin

Below is a (longish!) summary of the answers I got to the following=
questions:

- To what extent have last names, first names and addresses been erased
from tape/video recordings and replaced by fictitious names in the
transcriptions?

- If first names have been changed as well as last names and addresses,
what were the reasons given for doing so? (legal, ethical, or other)

- Were names changed manually or automatically (eg by means of a
"search-replace" word processor function)?

1) Gerald Nelson, University College London: The ICE corpus

Since 1990, I have been responsible for=20
collecting, transcribing, and digitizing the recordings for the British ICE=
=20
corpus. Perhaps I should say that all recording was non-surreptitious -=20
speakers were required to complete a form prior to recording, granting=20
permission for its use in academic research. The form also contained an=20
option to have names changed in the transcripts and in the recording.

In practice, very few speakers opted for anonymity. However, the=20
authors of personal and business letters very often chose anonymity. In all=
=20
cases we changed first and last names, as well as addresses. =20

Names were changed for legal reasons.

Names were changed manually during transcription, and explicitly marked=20
as changed names. In the digitized version, on CD, we concealed names by=20
putting a "beep" in the appropriate place.

2) Antoinette Renouf, University of Liverpool

I think that the point of your investigation, which I
understand to be that there is a sociolinguistic etc significance in
the first name choice, is not something that they will have considered
much. The question of preserving anonymity will almost certainly have
been overriding in their considerations, since thay are largely
administrators and worried about legal issues.

I would have thought you would have more luck looking for work done by
linguists and sociolinguists on names per se. You might talk to Patrick
Hanks at OUP, because he did at least a surnames dictionary and maybe a
first name dictionary. Or try other lexicographers and editors of the
books of names for naming children.

Also, occasionally the newspapers announce that such and such are the
top ten names for babies in the country. They must get this info from
the registration of birth offices. I think sociolinguistics is the
place to look for research and bibliography.

3) Bernadette Vine, Victoria University of Wellington: The Wellington
Corpus of Spoken New Zealand English:

Real names have not been erased on the tapes/videos unless the people
who recorded the tape specifically requested them to be. In the
transcripts pseudonyms are always used unless the name is a matter of
public record i.e., broadcast material. Broadcasting material from
the Corpus is prohibited except where specific permission has been
obtained. Researchers using the Corpus have to sign a document saying
they will not disclose any information from the material they listen to.

First and last names have been changed and place names where this may
identify speakers. This was done because speakers were assured that
their identity would be protected.

Names were changed during the initial transcription stage. Generally
names with the same gender or ambiguity of gender as the real name,
stress patterns, number of syllables and ethnicity were used as
pseudonyms.

We are currently collecting and transcribing another Corpus and have
followed the same general principles (with a few differences due to
the differing nature of the two corpora).

4) Svenja Sachweh, Freiburg: nursing homes corpus

since I'm working in the sensitive area of communication in nursing homes
for the aged, I practically changed everything in order to guarantee 100%
anonymity. Due to the fact that I did not have the technical means to erase
something from my tape-recordings, I did not do that. (The chances are very
good that no-one else but me will ever listen to the tapes.) However, I
replaced everything (i.e. first and last names, place names, addresses,
etc.) in the transcripts.

I changed first names for ethical reasons - after all, I promised to do
that when I asked for permission to audiotape conversations!

I did use a search and replace function. However, since I keep finding
instances of real names during analysis, I also change names manually.

5) Susan Meredith Burt, University of Wisconsin Oshkosh: University
students corpus

I have done work with taped conversations 1) between university
students--Americans paired with students from other countries--and 2) of my
family the year that we hosted a foreign student in our home. With
conversations between students, I have changed their names to plausible
first names that begin with the same letter. Claire would become Claudine,
for example. This makes it possible for me to remember who's who. In the
case of people the speakers talk about, I change first name and last name
the same way. In the case of my family, I assumed that anonymity was
impossible, so I left our own names the way they are, but I systematically
changed the name of our German guest. I have not erased any names on the
tapes. I simply change the names as I transcribe. this is not hard--you
just have to think out everyone's pseudonyms before you begin transcribing.

6) Christine Cheepen, University of Surrey

I always (...) replace names by syllabic equivalents. Sometimes it
may not be necessary, but I think it is safer, just in case someone objects
later on.

The reasons given for changing names:
Legal and ethical - casual conversation nearly always involves some gossip
about people not present. In what I would call transactional (as opposed
to interactional) dialogue - e.g. service calls, there is of course the
problem of confidentiality.

names changed manually or automatically:
It very much depends on how many items need to be changed. Sometimes I
have transcribed casual conversation by changing manually as I transcribe,
but in those cases I always do a search and replace at the end in case I've
missed some.

7) Marco Rocha, University of Sussex

I have invariably replaced all names and addresses, including names of
buildings, such as hospitals. I have not erased them from tape recordings.

I have personally ensured informants that anonymity would be guaranteed. The
conversations occur in a hospital and many of them involve medical=
information
of a private nature. Reasons (for replacing names) are thus ethical.

Names were changed manually as the data were transcribed by myself.
Replacements used
attempt strenuously to retain the prosodic features of the speech recorded.

8) Bill Fisher: TIMIT, SWITCHBOARD and CALLHOME

I've been involved in the production and processing
of speech corpora sponsored more-or-less directly by
ARPA for a long time, starting with TIMIT and going
through SWITCHBOARD and CALLHOME. I don't believe there
has ever been an effort to disguise personal names
that are used in the conversations we record, although
speaker i.d. is a thinly-disguised code in accompanying
tables of speaker information.

There are probably 2 reasons for what may seem to be
a very lax policy: 1) the subjects sign a legal form
allowing the recording agency to do anything they want
to with the speech; 2) since the main use of the corpora
is to train and test speech recognizing computers,
a mutilated speech wave would hurt.

Bill Fisher

9) Dan I. Slobin, University of California at Berkeley: child language=
corpora

In child language research it has been the norm for the past 30=20
years or more to protect the identity of the child by assigning a=20
pseudonym. Roger Brown started this in his pioneering work at Harvard in=20
1962, naming the first two children in the study "Adam" and "Eve." His=20
proposal was to work forward through the Bible, and so the third child he=20
studied was named "Sarah." Several others followed this model, naming=20
children "Noah" and "Shem." Other investigators simply picked another=20
name for the child. Last names are always deleted, and the names of=20
other participants are also changed (siblings, visitors, etc.). This is=20
easily done by global replacement.
It is part of our agreement with committees for the protection of=20
human subjects that all participants in psychological research be kept=20
anonymous, and that the data can be used and distributed only with the=20
consent of the subjects. Every university has a standing committee for=20
this purpose, with its own set of rules.
Tape- and video-recordings can only be used with the consent of=20
the persons who were recorded (or their parents). It is more difficult=20
in the case of videotapes, and one must be very careful about informed=20
consent, since the identity of the participants can't be hidden from view.

-----------------------------------------------
Kristine Hasund
English Department
H=F8gskolen i Agder
Tordenskjoldsgate 90
4604 Kristiansand
Tel: 38 14 16 43
Email: Kristine.Hasund@hia.no
-----------------------------------------------