Corpora: New Bookmark site for Corpus-based Linguists

From: David Lee (david_lee00@hotmail.com)
Date: Mon Dec 03 2001 - 00:43:30 MET

  • Next message: COMP staff: "Re: Corpora: approximations (bounds) for edit distance"

    Dear All,

    I would like to announce to the list that I've just created a web site with
    links for corpus-based linguistics. The web pages started as a resource for
    the MA students here at Lancaster doing the corpus-based linguistics course,
    but perhaps it may have a wider appeal. Before I go into any more detail,
    however, I wish to thank the many corpus linguists who 'test-drove' the site
    and took the time to give me valuable feedback. They've helped make the site
    more accessible to all and more complete.

    There are already a number of sites out there with similar content, but here
    are a few key things about my site:

    (1) it's up-to-date (I've checked practically all the links to make sure
    there are no dead ones, except those which are permanently dead (i.e. no
    known new URL!); (2) it focuses on links for linguists and lg teachers (not
    NLP/lg engineering); (3) My listings are mostly annotated (i.e. have
    descriptions of the links, so you don't always have to click a link to find
    out what it's about); (4) I hope I've brought together in *one* place all
    the information on corpora, software tools, bibliographies, references,
    electronic papers, mailing lists, on-line courses, conferences, etc. that
    people doing corpus work will possibly need.

    The web site is, I believe, fairly exhaustive (for English corpora, tools,
    and references, at least), but I would make the usual plea for people to
    contact me with more links and resources that I've missed, if they spot any
    mistakes (dead links, non-existent sites, wrong information, etc.), and
    especially if they have written papers/notes/squibs which are available
    *on-line* for downloading.

    Please take the trouble to let me know about anything you'd like to share
    with the rest of the research community (links, papers and resources... e.g.
    if you've collected a (small) corpus or collection of materials which is
    available on-line or could be made available). The usefulness of the site
    will be increased if people actively participate to keep it current and
    complete.

    The URL alias for my site is:

    http://devoted.to/corpora

    which I think is rather nice ;-). Please bookmark this web address rather
    than anything else, as this is permanent, whereas other page/frame addresses
    may well change their names without warning. The downside of using this
    mnemonic alias is that a little advert window pops up... just ignore it and
    close it down immediately.

    As I've said, I would appreciate it if people could have a look and give me
    feedback (e.g. "I think it's great!" or if you have any suggestions on how
    to improve the structure/organisation of information, or if there are any
    glaring omissions), bearing in mind that this site is meant primarily for
    *linguists* or lg teachers (and, secondarily, for humanities scholars) who
    happen to work with corpora, not speech technologists or NLP people
    (although I've also provided the most important 'technical' links, so that
    people who wish to get more info in that direction may do so).

    I think that this site is more organised and more complete than most of the
    other sites that I've seen, which are geared more towards *NLP* /lg
    technology, and also tend to lump everything on one page (so that you have
    to wade through lots of undifferentiated stuff to find what you want).
    I've tried not to replicate other bookmark sites (e.g. Mike Barlow's and
    Manuel Barbera's (whose links for *non-English* corpora I have not even
    tried to duplicate) but at the same time I've deliberately repeated some of
    the main links for the sake of convenience (there is no point in continually
    sending people to other people's web sites!), so that at much info as
    possible is provided on-site.

    Hope this will be of use to some.

    Regards,

    David Lee

    P.S. Some people may be interested to know that I've now produced a new
    version of my BNC Index: for the BNC World Edition (BNC version 2). You
    might want to download this new version if the World Edition is the corpus
    you're now using (which you should do... the World Edition has fewer tagging
    and text classification errors than BNC 1...). My new web address for this
    is: http://clix.to/davidlee00 (the old address is now permanently gone and
    no longer valid).



    This archive was generated by hypermail 2b29 : Mon Dec 03 2001 - 01:02:07 MET