Re: Corpora: Santa Barbara Corpus

From: Chris Manning (manning@CS.Stanford.EDU)
Date: Mon Aug 07 2000 - 17:50:28 MET DST

  • Next message: Lou Burnard: "Re: Corpora: Santa Barbara Corpus"

    On 7 August 2000, Lou Burnard wrote:
    > Hmm. So instead of using pre-existing standards which at least have a
    > chance of being implemented across different computer platforms, it's
    > better to make up an entirely arbitrary set of codes of your own for
    > which *everyone* has to write their own software?

    This is a little harsh. The transcription format used has existed and
    been developed for many years in the conversational/discourse analysis
    community -- and versions of it can be found in books such as Edwards'
    Talking Data: Transcription and Coding in Discourse Research or
    Schiffrin's Approaches to Discourse.

    At most the LDC could be faulted for leaving the data in such a format
    -- one clearly designed more for human observation than easy computer
    manipulation -- rather than converting it to a more computer friendly
    standard markup.

    Chris Manning



    This archive was generated by hypermail 2b29 : Mon Aug 07 2000 - 17:48:41 MET DST