Transcription and Markup Procedure
Ideally, in undertaking to transcribe a corpus of the size of the WSC, one would use a few highly trained transcribers. Unfortunately, our resources were very limited and so transcription was spread over a considerable period, resulting in a much higher turn-over of transcribers than was desirable.
We adopted the following procedures in relation to transcription and markup (see also Johnson and Holmes 1996).
1. Checking eligibility
Because transcribing is so time consuming, ideally tapes were not transcribed until the sound quality of the recording and the eligibility of contributors as speakers of NZE had been checked, and background information sheets had been obtained for all contributors.
2. Cover sheet
A cover sheet was created for every transcription. These coversheets included a range of information in terms of recording, speaker and transcription details. In the release version of the WSC, these coversheets have been removed, but this information is provided in section 15, Texts. It is also available in the database files included on the WSC CD (see section 17, Database Files).
A detailed Transcription Manual describing the main conventions adopted for the transcription of the WSC is being prepared for publication. The main conventions adopted are outlined in Johnson and Holmes (1996). The conventions were modified for the release version of the WSC and the ones adopted in this release version are described in section 14, Transcription and Markup Conventions.
Proofreading was undertaken by a different person from the transcriber.
The original transcriber edited the tape on the basis of the proofreader's amendments. In cases where the transcriber disagreed with the proofreader's amendments, the points of disagreement were discussed. If an agreed transcription could not be reached, a third opinion was sought for arbitration. This process serves not only to ensure the quality of transcription is maintained but also as a valuable training method for transcribers.
6. Printing and filing
Once editing had been completed and the number of words in the extract recounted, a final copy of the transcript was printed and filed.
7. Manual Markup
After all extracts were edited, each extract was listened to again and manual markup was added. This included the insertion of text/intonation unit markers.
8. Automated markup
The remaining markup was automated and mainly involved replacing our transcription conventions with the sgml conformant ICE conventions (see section 14).