Corpora: Readings in Corpus Linguistics

From: Geoffrey Sampson (geoffs@cogs.susx.ac.uk)
Date: Mon Sep 03 2001 - 10:49:11 MET DST

  • Next message: Matthew Purver: "Corpora: in-line PoS tagger"

    Dear Corpora Listers,

    As mentioned earlier this summer, Anne Wichmann and I are now definitely
    planning to edit a book of Readings in Corpus Linguistics, and we are
    beginning to negotiate with publishers. This message is an invitation to
    the corpus community to help us make it a useful publication.

    We certainly haven't found it difficult to assemble lists of items worthy
    of inclusion, in fact we already have a set of possibles which is quite
    a bit too long and will have to be cut down. But, even so, we could very
    easily have overlooked crucial papers which have done far more to define
    or advance the field than some of those we are thinking of including.
    Many of you will have personal favourite items in the literature, perhaps
    gems that you know about but which appeared in hard-to-get-hold-of places,
    or which are accessible but don't seem to be appreciated as widely as they
    deserve. Anne and I would really like to hear your nominations --
    obviously we won't necessarily include them, but we certainly ought to
    consider them. For that matter, we'd like to hear your ideas about papers
    that everyone knows and recognizes as fundamental: you may think they are
    too obvious to mention, but we could well have forgotten about them.

    Our interpretation of "Corpus Linguistics" is a broad one. We want the
    book to cover both technical NLP and humanities approaches. During
    our careers, the subject has evolved from a "minority of a minority"
    speciality into a major concern of linguistics and computing departments,
    and it has been changing and developing almost explosively over the last
    decade. As a result, we have the impression that there are many people
    who have got drawn in recently who are in the situation of the blind men
    with the elephant -- they have learned about the bit of the subject they
    deal with directly, but they feel at sea about the overall purview of
    the discipline, and find it hard to read the literature because they don't
    yet have much perspective on where the subject has come from or how the
    various aspects relate to one another. We want to produce a book that
    gives that perspective. Thus we are specially keen to include
    papers that bridge the divide between technical matters like XML, stats,
    or automatic parsing, and humanistic considerations such as literary
    language, language teaching, sociolinguistics, or historical linguistics.

    We also want so far as possible to find short pieces, so that we can
    introduce many different topics without the book becoming too big. And
    we aim to include a leavening of papers that are entertaining as well
    as informative; we hope the book may help to attract new recruits to the
    discipline by showing them that corpus linguistics is fun.

    Looking through our own tentative list of "possibles", topics which seem
    thinly covered so far include semantics, and "World Englishes and
    nonstandard dialects" -- we have one or two possibles for each, but we
    are still looking for "killer papers". And of course our list of twenty
    or so corpus linguistics areas may itself have overlooked important
    topics.

    Please share your opinions with us!

    Geoffrey Sampson

    G.R. Sampson, Professor of Natural Language Computing

    School of Cognitive & Computing Sciences
    University of Sussex
    Falmer, Brighton BN1 9QH, GB

    e-mail geoffs@cogs.susx.ac.uk
    tel. +44 1273 678525
    fax +44 1273 671320
    web http://www.grsampson.net



    This archive was generated by hypermail 2b29 : Mon Sep 03 2001 - 10:43:52 MET DST