Corpora: "must have" lists

From: Geoffrey Sampson (geoffs@cogs.susx.ac.uk)
Date: Tue Jun 19 2001 - 18:10:19 MET DST

  • Next message: Michael Audenaert: "Corpora: Corpora in Uzbek"

    Lou suggests organizing a "reader" of classic articles in corpus linguistics
    as an electronic corpus. Hmm ...

    I feel a bit sceptical about that for two reasons. One is that I doubt that
    the primary purpose (of helping newcomers "read themselves in" to the field)
    would be achieved well by an electronic corpus of articles -- people like
    reading off paper bound into journals and books, not off the screen -- and
    publishers would be less enthusiastic about publishing a collection
    (and copyright permissions might be harder to get) if the material were also
    being made available electronically. (I know there are exceptions, such
    as the _State of the Art in Human Lg Technologies_ book available both on
    the Web and from CUP -- but I think they will always be exceptions rather
    than the norm.)

    Also, I am not sure that Lou's second purpose, of providing a source from
    which one could monitor the development of terms of art in the field, would
    really be achieved all that well by a collection of the N classic readings
    in the history of corpus ling. Lou knows a lot more about lexicography
    than I do, but it seems to me that the limited number of items one would
    most want to encourage newcomers to read would not necessarily coincide with
    the texts that best exemplified the development of terminology -- for that,
    would a larger bulk of less-exciting items not be more informative?

    But I certainly agree with Lou that it would be interesting to see how far
    people's "must have" lists coincided. Since the note of mine to which
    Lou is responding, Anne Wichmann and I have discussed whether we might
    actually propose a collection like this to a publisher -- I'm not sure
    whether either of us is yet clear that we want to commit ourselves to the
    effort, but we are clear that one desirable thing would be to use the
    Corpora List to get people to propose their personal Top N lists. I had
    thought we would probably wait till we actually got to the stage of
    putting a synopsis in front of a publisher, if we ever do -- but I suppose
    since Lou has raised the idea, people might want to have fun over the
    summer putting together such lists! Mine would include
    the article from _ICAME News_ by ??Stig Johansson and Geoff Leech?? about
    significant vocabulary differences between British and American English,
    and the one from a book edited by Nelleke Oostdijk, by ?Ken
    Church and Bill Gale?, "What is wrong with adding one?" -- but I haven't
    started seriously working out a proper list.

    Geoff

    G.R. Sampson, Professor of Natural Language Computing

    School of Cognitive & Computing Sciences
    University of Sussex
    Falmer, Brighton BN1 9QH, GB

    e-mail geoffs@cogs.susx.ac.uk
    tel. +44 1273 678525
    fax +44 1273 671320
    web http://www.grsampson.net



    This archive was generated by hypermail 2b29 : Tue Jun 19 2001 - 18:05:36 MET DST