Re: Corpora: Help please - downloading text from the Web

From: Andrew Harley (aharley@cup.cam.ac.uk)
Date: Mon Mar 27 2000 - 10:53:49 MET DST

Next message: Jean Veronis: "Re: Corpora: Help please - downloading text from the Web"

Previous message: Dave Braze: "Re: Corpora: Help please - downloading text from the Web"
In reply to: Geoff Wilkins: "Corpora: Help please - downloading text from the Web"
Next in thread: Jean Veronis: "Re: Corpora: Help please - downloading text from the Web"
Reply: Jean Veronis: "Re: Corpora: Help please - downloading text from the Web"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

At 11:34 AM 23/03/2000 GMT, Geoff Wilkins wrote:
>
>Hi. Can anyone help me with the following:
>
>I'm looking for software - preferably freeware or shareware - to
>use to download text from Web sites, for use in a corpus.

For the Cambridge International Corpus, we have used the following two
products to download websites (after obtaining permission from the site
owner - an important point that shouldn't be disregarded):

> WEBWHACKER - http://www.bluesquirrel.com/whacker
> The original off-line browser!
>
> GRAB-A-SITE - http://www.bluesquirrel.com/grabasite
> An "Industrial Strength" off-line browser!

WebWhacker compresses the data while Grab-a-Site delivers it as HTML
organised in directory structures - much easier to handle for us, so we now
use Grab-a-Site.

Andrew Harley
Systems Development Manager
English Language Teaching & Dictionaries
Cambridge University Press

Direct line: (01223)325880
Fax: (01223)325850

Try Cambridge International Dictionaries online (over one and a half
million searches since August 1999) at:
http://www.cup.cam.ac.uk/elt/dictionary

We have recently published the Cambridge Dictionary of American English
(book and CD-ROM combined for only $20.95): see http://www.cup.org/esl/cdae
for more details and to order online.

Next message: Jean Veronis: "Re: Corpora: Help please - downloading text from the Web"
Previous message: Dave Braze: "Re: Corpora: Help please - downloading text from the Web"
In reply to: Geoff Wilkins: "Corpora: Help please - downloading text from the Web"
Next in thread: Jean Veronis: "Re: Corpora: Help please - downloading text from the Web"
Reply: Jean Veronis: "Re: Corpora: Help please - downloading text from the Web"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Mon Mar 27 2000 - 10:52:59 MET DST