Re: Corpora: Help please - downloading text from the Web

From: Dave Braze (davebraze@uconn.cted.net)
Date: Mon Mar 27 2000 - 03:07:29 MET DST

Next message: Andrew Harley: "Re: Corpora: Help please - downloading text from the Web"

Previous message: Thorsten Brants: "Corpora: LINC-2000"
In reply to: Knut Hofland: "Re: Corpora: Help please - downloading text from the Web"
Next in thread: Christian Coseru: "Re: Corpora: Help please - downloading text from the Web"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Knut Hofland wrote:

> On Thu, 23 Mar 2000, Geoff Wilkins wrote:
>
> > I'm looking for software - preferably freeware or shareware - to
> > use to download text from Web sites, for use in a corpus.
>
> I have used w3mir
> http://www.math.uio.no/~janl/w3mir/
> and
> SiteSnagger
> http://hotfiles.zdnet.com/cgi-bin/texis/swlib/hotfiles/info.html?fcode=000P7Z
> Both have shortcomings, but I have downloaded gigabytes of HTML-files
> with the programs.

There is also wget:

http://www.interlog.com/~tcharron/wgetwin.html

I've only used it a little, but it seems serviceable enough.

-Dave

--
Dave Braze
Linguistics Department, U-1145
University of Connecticut
Storrs, CT 06269-1145 USA

Next message: Andrew Harley: "Re: Corpora: Help please - downloading text from the Web"
Previous message: Thorsten Brants: "Corpora: LINC-2000"
In reply to: Knut Hofland: "Re: Corpora: Help please - downloading text from the Web"
Next in thread: Christian Coseru: "Re: Corpora: Help please - downloading text from the Web"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Mon Mar 27 2000 - 10:26:08 MET DST