Re: Corpora: Santa Barbara Corpus

From: Lou Burnard (lou.burnard@computing-services.oxford.ac.uk)
Date: Mon Aug 07 2000 - 17:55:38 MET DST

  • Next message: Chris Allen: "Corpora: Sublanguage article"

    On Mon, 7 Aug 2000, Chris Manning wrote:

    |On 7 August 2000, Lou Burnard wrote:
    | > Hmm. So instead of using pre-existing standards which at least have a
    | > chance of being implemented across different computer platforms, it's
    | > better to make up an entirely arbitrary set of codes of your own for
    | > which *everyone* has to write their own software?
    |
    |This is a little harsh. The transcription format used has existed and
    |been developed for many years in the conversational/discourse analysis
    |community -- and versions of it can be found in books such as Edwards'
    |Talking Data: Transcription and Coding in Discourse Research or
    |Schiffrin's Approaches to Discourse.
    |
    |At most the LDC could be faulted for leaving the data in such a format
    |-- one clearly designed more for human observation than easy computer
    |manipulation -- rather than converting it to a more computer friendly
    |standard markup.

    Fair point, well made. Thanks Chris! Put the harshness down to my
    general gloom at being confronted with 300 email messages after ten
    days on a beach in Northern Portugal... But the devil in all digital
    affairs is in the detail and it's that phrase "versions of it" that gives
    away why it's a retrograde step for a project with such high visibility,
    importance, and resourcing to distribute such wonderful data in a way
    that makes it REALLY DIFFICULT for a computer to analyse it. If we're only
    concerned to produce data for humans to read, let's print it out on bits
    of paper.

    Lou

     ----------------------------------------------------------------
     Lou Burnard http://users.ox.ac.uk/~lou
     ----------------------------------------------------------------



    This archive was generated by hypermail 2b29 : Mon Aug 07 2000 - 17:53:39 MET DST