Corpora: Summary: Corpus metadata

From: Mikko Lounela (mlounela@kotus.fi)
Date: Mon Jun 24 2002 - 09:38:16 MET DST

  • Next message: geoffrey.williams: "Corpora: APPEL A COMMUNICATIONS ET A PARTICIPATION"

    Hi there again.

    about two weeks ago I posted a query about corpus metadata. I also
    promised to post a summary. Thank you very much for the answers (total
    8), and here is the summary.

          - Mikko

    Here is the original query:

    >From mlounela@kotus.fi Mon Jun 24 09:41:28 2002
    >Date: Wed, 5 Jun 2002 13:36:14 +0300 (EET DST)
    >From: Mikko Lounela <mlounela@kotus.fi>
    >To: CORPORA@HD.UIB.NO
    >Subject: Corpus metadata
    >
    >
    >Hello everybody.
    >
    >I am currently trying to figure out what information to include in text
    >corpora metadata. At this point, I'm trying to collect references. So, if
    >you have any to share, I would be most grateful. Summary will follow.
    >
    > - Mikko Lounela

    Here is a brief summary:

    Paul Clough recommended two books:
    Corpus Linguistics (1996), Tony McEnery and Andrew Wilson, Edinburgh
    textbooks in empirical linguistics. and
    Corpus Annotation (1997), Roger Garside, Geoffrey Leech and Tony McEnery,
    Longman.

    Mickel Grönroos told that the Language Bank of Finland uses a metadata
    set that resembles Dublin Core
    (<http://www.dublincore.org/documents/1999/07/02/dces/>).

    Lou Burnard guided to the TEI guidelines
    (<http://www.tei-c.org/Guidelines>, in particular chapters 5 and 23).

    Manne Miettinen told to have a look at IMDI and OLAC
    (<http://www.mpi.nl/ISLE/index.html>,
    <http://www.language-archives.org/>)

    Rita Simpson recommended articles by Simpson & Powell in the book
    edited by Rita Simpson & John Swales, Corpus Linguistics in North
    America: Selections from the 1999 Symposium, 2001, Univ. of Michigan
    Press and another article by Simpson, Lucka & Ovens in the proceedings
    volume of TALC 1998, edited by Burnard & McEnery.

    Sven Hartrumpf suggested the Corpus Encoding Standard
    (<http://www.cs.vassar.edu/CES/>
    esp. <http://www.cs.vassar.edu/CES/CES1-3.html>).

    Martin Wynne gave a few pointers, which were the TEI guidelines, BNC
    User Reference Guide section 8
    (<http://www.hcu.ox.ac.uk/BNC/World/HTML/cdifhd.html>), OLAC, and also
    mentioned a seminar to be held at the Oxfrod Text Archive
    (<http://www.oucs.ox.ac.uk/ltg/courses/summer/documents/corpora.htm>)

    Truus Kruyt recommended Kruyt & Dutilh 1997 at <www.inl.nl> sub
    Publications.

    Here are all the answers (some in Finnish):

    **************************************
    From p.clough@dcs.shef.ac.uk
    Mon Jun 24 09:42:46 2002 Date: Wed, 5 Jun 2002 12:02:05 +0100 From:
    Paul Clough <p.clough@dcs.shef.ac.uk> To: Mikko Lounela
    <mlounela@kotus.fi> Subject: Re: Corpora: Corpus metadata

    Mikko,

    Two references for you:

    Corpus Linguistics (1996), Tony McEnery and Andrew Wilson, Edinburgh
    textbooks in empirical linguistics.

    Corpus Annotation (1997), Roger Garside, Geoffrey Leech and Tony McEnery,
    Longman.

    These both mention meta-linguistic information.

    Best,

    Paul.

    ----------------------------------------------------------------------------
    ---------------------
    Paul Clough

    Natural Language Processing Group,
    Department of Computer Science,
    University of Sheffield,
    G35 Regent Court,
    211 Portobello Street,
    SHEFFIELD,
    S1 4DP.

    **************************************



    This archive was generated by hypermail 2b29 : Mon Jun 24 2002 - 09:46:49 MET DST