Corpora: Summary of replies

From: Rodrigo Tadeu Gonçalves (acollon@ig.com.br)
Date: Wed May 29 2002 - 19:12:47 MET DST

  • Next message: Sampo Nevalainen: "Corpora: a program needed"

    Hello,

    I'm posting the summary of the answers I got from my last help message:

    Kiril Simov <kivs@bultreebank.org> wrote:

    Dear Rodrigo,

    Please check our system for corpora development CLaRK. We could
    download it from:

    http://www.BulTreeBank.org

    and then CLaRK system link.

    ----------

    Rita Carol Simpson <ritacsim@umich.edu> wrote:

    Hello,
    Our website on the MICASE corpus has some pages that deal specifically
    with transcription and markup of a corpus of spoken English. These may
    be useful to you insofar as they relate directly to issues involved in
    corpus-building.

    ----------

    Eric Atwell wrote:

    Rodrigo,
    Please could you summarise any replies you get and post this summary
    back to the CORPORA list - this may be useful to others building
    corpora, including students here at Leeds University.

    I suggest one place to start is ICAME, the International Computer
    Archive of Modern and medieval English, host of the CORPORA mailing list
    and of the ICAME website http://www.hd.uib.no/icame.html

    Info on the website which might help you includes Manuals for the corpora
    distributed by ICAME; most include background info on how the corpora were
    collected and tagged etc: http://khnt.hit.uib.no/icame/manuals/index.htm

    ICAME also publishes ICAME Journal, with back issues online on the website;
    ICAME Journal includes papers relevant to corpus building and tagging, you
    could start with paper(s) on the language genre(s) you are interested in,
    eg:

    Alejandro Curado Fuentes, "Exploitation and assessment of a Business English
    corpus through language learning tasks", ICAME Journal Vol.26 pp5-32, 2002

    Norma Pravec, "Survey of learner corpora", ICAME Journal Vol.26 pp81-114,
    2002

    Ma Dolores Ramirez Verdugo, "Non-native interlanguage intonation
    systems: a study based on a computerised corpus of Spanish learners of
    English", ICAME Journal Vol.26 pp115-132, 2002

    Claudia Claridge, "Causal Clauses in written and speech-related genres
    in Early Modern English", ICAME Journal Vol.25 pp31-64, 2001

    Eric Atwell, George Demetriou, John Hughes, Amanda Schiffrin, Clive
    Souter and Sean Wilcock, "A comparative evaluation of modern English
    corpus grammatical annotation schemes", ICAME Journal Vol.24 pp7-24, 2000

    Merja Kytö, Juhani Rudanko and Erik Smitterberg, "Building a bridge
    between the present and the past: A corpus of 19th-century English",
    ICAME Journal Vol.24 pp85-98, 2000

    Winnie Cheng and Martin Warren, "Facilitating a description of
    intercultural conversations: the Hong Kong Corpus of Conversational English"
    ICAME Journal Vol.23 pp5-20, 1999

    Manfred Markus, "Getting to grips with chips and Early Middle
    English text variants: sampling Ancrene Riwle and Hali Meidenhad",
    ICAME Journal Vol.23 pp35-52, 1999

    Arja Nurmi, "The Corpus of Early English Correspondence Sampler (CEECS)",
    ICAME Journal Vol.23 pp53-64, 1999

    Tobias Rademann, "Using online electronic newspapers in modern
    English-language
    press corpora: Benefits and pitfalls", ICAME Journal Vol.22 pp49-72, 1998

    Minna Vihla, "Medicor: A corpus of contemporary American medical texts",
    ICAME Journal Vol.22 pp73-80, 1998

    Rainer Siemund and Claudia Claridge, "The Lampeter Corpus of Early Modern
    English Tracts", ICAME Journal Vol.21 pp61-70, 1997

    Gregory John Watson, "The Finnish-Australian English Corpus",
    ICAME Journal Vol.20, pp41-70, 1996

    Anneli Meurman-Solin, "A new tool: The Helsinki Corpus of Older Scots
    (1450-1700)", ICAME Journal Vol.19, pp49-62, 1995

    Roger Garside, "The marking of cohesive relationships: tools for the
    construction of a large bank of anaphoric data",
    ICAME Journal Vol.17 pp5-28, 1993

    Merja Kytö and Matti Rissanen, "A language in transition: the Helsinki
    corpus of English texts", ICAME Journal Vol.16, pp7-26, 1992

    Elizabeth Green and Pam Peters, "The Australian Corpus project and
    Australian English", ICAME Journal Vol.15 pp.37-54, 1991

    Brian MacWhinney and Catherin Snow, "The Child Language Data Exchange
    System CHILDES", ICAME Journal Vol.14 pp.3-25, 1990

    Louis Milic, "A new historical corpus", ICAME Journal Vol.14, pp.26-39, 1990

    Sidney Greenbaum, "The International Corpus of English",
    ICAME Journal Vol.14 pp.106-108, 1990

    Clive Souter, "The COMMUNAL project: extracting a grammar from the
    Polytechnic of Wales Corpus", ICAME Journal Vol.13, pp.20-27, 1989

    Nelleke Oostdijk, "A corpus for studying linguistic variation",
    ICAME Journal Vol.12, pp3-14, 1988

    Marion Owen, "Evaluating automatic grammatical tagging of text",
    ICAME Journal Vol.11 pp.18-26, 1987

    Pam Peters, "Towards a corpus of Australian English",
    ICAME Journal Vol.11 pp.27-38, 1987

    K Ahmad and G Corbett, "The Melbourne-Surrey Corpus",
    ICAME Journal Vol.11 pp.39-43, 1987

    Charles Meyer, "Punctuation practice in the Brown Corpus"
    ICAME Journal Vol.10, pp.80-95, 1986.

    Barbara Booth, "Revising CLAWS", ICAME Journal Vol.9 pp.29-35, 1985

    Geoffrey Leech, Roger Garside and Eric Atwell, "The Automatic Grammatical
    Tagging of the LOB Corpus", ICAME Journal Vol.7 pp.13-33, 1983

    J M Gill, "The Gill Corpus", ICAME Journal Vol. 4 pp.7-8, 1980

    Louis Milic, "The Augustan Prose Sample and the Century of Prose Corpus",
    ICAME Journal Vol.4, pp.11-12, 1980

    ICAME Journal also includes reviews and abstracts of books and other
    publications relevant to corpus building and annotation, as "pointers"
    to the wider research literature. However, NOTE that some of the
    earlier papers cited above pre-date Windows-XP so the software may not
    be readily re-usable on today's Windows-based PCs :)

    Last by DEFINITELY not least, I recommend the searchable ICAME
    bibliography database recently put online by Knut Hofland:

    http://korpus.hit.uib.no/icame/bib_search.html

    ----------

    I'd like to thank you all for helping me with the links and bibliography.
    We're trying to start a project of encoding a corpus with graded texts for
    Brazilian learners of English at the Federal University of Parana, as my
    end-of-course monograph.

    Thanks again for the attention,

    Best wishes,

    Rodrigo T. Gonçalves



    This archive was generated by hypermail 2b29 : Wed May 29 2002 - 19:28:23 MET DST