Re: Corpora: Brill's vs. CLAWS

From: E S Atwell (eric@comp.leeds.ac.uk)
Date: Tue Jul 17 2001 - 13:51:57 MET DST

  • Next message: Mohamed Noamany: "Corpora: POS Tagging tool needed ."

    One advantage of CLAWS-tagging is that Lancaster U offers a professional
    tagging service so you can "outsource" your tagging, see
    http://www.comp.lancs.ac.uk/computing/research/ucrel/claws/tagservice.html
     - you can tag a sample of 300 words free via a web-browser, and if you
    like what you see contact Chris Needham, chris@comp.lancs.ac.uk for a
    quotation on delivery schedule and cost. Alternatively you can buy a
    site-licence to set up and run the tagger yourself for GBP750.

    Using Brill's tagger is more like "Do-It-Yourself":
    you can download the tagger software, free, from Eric Brill's homepage
    http://research.microsoft.com/~brill/
    then run it on your own texts yourself. Alternatively, you could try our
    free email-server version, just email your text (plain ascii, not
    HTML/doc/etc, and not an attachment) to amalgam-tagger@comp.leeds.ac.uk
    with Subject: Brown) and it should be returned with the tags supplied by
    standard Brill tagger. Either way, there is no equivalent of Chris Needham
    to advise and guide you through the process: we do not have a Project
    Manager to assist customers of this free service...

    One advantage of Brill's tagger is greater flexibility in the tagsets: the
    original version comes trained to apply Brown Corpus tagset, but it can be
    retrained with another tagged corpus to apply another tagset. You can "do
    this yourself" with your own preferred tagged corpus. You could also try
    ICE-GB tagset on your own texts by using amalgam-tagger service, this time
    email your text to amalgam-tagger@comp.leeds.ac.uk with Subject: ICE
     or you can try other tagsets by changing the Subject to one or more of
    Brown ICE LLC LOB Parts POW SEC UPenn
     - see http://www.comp.leeds.ac.uk/amalgam/amalgam/amalgtag3.html

    BNC is NOT one of the tagsets we offer, unfortunately since this is a
    strong candidate for de-facto standard for (British) English corpus-based
    research, not only in Corpus Linguistics but also Natural Language
    Processing. Significantly, BNC C5 and C7 tagsets are included in
    Jurafsky and Martin, "Speech and Language Processing", Prentice-Hall 2000
    - the standard textbook for NLP final-year-undergrad/Masters-level NLP,
    see http://www.cs.colorado.edu/~martin/slp.html
     so you should find it easier to recruit researchers with knowledge of BNC.

    Eric Atwell

    -- 
    Eric Atwell, Distributed Multimedia Systems MSc Tutor & SOCRATES Tutor
    School of Computing, University of Leeds, LEEDS LS2 9JT
    TEL: 0113-2335430  MOBILE: 0775-1039104 FAX: 0113-2335468
    WWW: http://www.comp.leeds.ac.uk/eric  EMAIL: eric@comp.leeds.ac.uk
    

    On Mon, 16 Jul 2001, Veronika Koller wrote:

    > Dear list members, > without wanting to trigger a bi-partisan discussion, I would still like to > inquire about the advantages of Brill's tagger over CLAWS tagging service > or vice versa. The situation at our department leading to this question is > the following: > So far, we have only worked with Cobuild's Bank of English and > self-compiled corpora, using WordSmith Tools as a concordance program for > the latter. Currently, however, we are planning to obtain several other > corpora such as the BNC (incl. SARA), the Wolverhampton Corpus of Business > English (by the way: what kind of concordance program would work best with > that?) and ICE-GB (incl. ICE-CUP). We have had texts tagged by CLAWS and > the result proved to be quite useful for our purposes. Since we would very > much like to streamline our software resources as much as possible (which > doesn't seem to be much anyway), we'd rather know about the respective > (dis)advantages in advance. A helpful starting point might e.g. be if > someone could provide a sample text tagged with the help of Brill's. > > Your help will be much appreciated and a summary will be posted. > > Regards, > Veronika Koller > Mag.a Veronika Koller > Department of English/Business English > Vienna University of Economics and Business Administration > Augasse 9 > A-1090 Vienna > Tel.: 43/1/31336-4068 > Fax: 43/1/31336-747 > >



    This archive was generated by hypermail 2b29 : Tue Jul 17 2001 - 13:47:48 MET DST