Re: Corpora: Help please - downloading text from the Web

From: Andrew Harley (aharley@cup.cam.ac.uk)
Date: Mon Mar 27 2000 - 10:53:49 MET DST

  • Next message: Jean Veronis: "Re: Corpora: Help please - downloading text from the Web"

    At 11:34 AM 23/03/2000 GMT, Geoff Wilkins wrote:
    >
    >Hi. Can anyone help me with the following:
    >
    >I'm looking for software - preferably freeware or shareware - to
    >use to download text from Web sites, for use in a corpus.

    For the Cambridge International Corpus, we have used the following two
    products to download websites (after obtaining permission from the site
    owner - an important point that shouldn't be disregarded):

    > WEBWHACKER - http://www.bluesquirrel.com/whacker
    > The original off-line browser!
    >
    > GRAB-A-SITE - http://www.bluesquirrel.com/grabasite
    > An "Industrial Strength" off-line browser!

    WebWhacker compresses the data while Grab-a-Site delivers it as HTML
    organised in directory structures - much easier to handle for us, so we now
    use Grab-a-Site.

    Andrew Harley
    Systems Development Manager
    English Language Teaching & Dictionaries
    Cambridge University Press

    Direct line: (01223)325880
    Fax: (01223)325850

    Try Cambridge International Dictionaries online (over one and a half
    million searches since August 1999) at:
    http://www.cup.cam.ac.uk/elt/dictionary

    We have recently published the Cambridge Dictionary of American English
    (book and CD-ROM combined for only $20.95): see http://www.cup.org/esl/cdae
    for more details and to order online.



    This archive was generated by hypermail 2b29 : Mon Mar 27 2000 - 10:52:59 MET DST