Corpora: Qs. reg. collection of hypertext corpus

rauchc@gmx.de
Thu, 17 Sep 1998 17:48:05 +0200 (MEST)

I'm currently working on a paper on linguistic features of private home
pages & face the following problem - how to collect the data, i.e. the
pages/websites??? I'm aware of quite a few automatic downloaders/off-line browsers,
but none of those I've reviewed so far offers the following (in a convenient
way, that is):

Instead of manually entering the URL to be used as the starting point I'd
like the tool to use a file that contains the URLs I want to download, then
browse 'em one by one. While browsing, the prog is to stay within the
initial site (it's directory on the respective web server, that is).

TeleportPro, for instance, sort of allows for this - however, it regards
the file itself as the initial URL and treats all URLs contained therein as
links (which is not what I want, since this renders the 'scan current
directory/domain only' feature impossible).

Anybody out there who knows of / has written a prog that would do the
above trick, and perchance offers support for proxy servers & firewalls as well
(yes, I know that I'm more than a tad optimistic :)?

Thanks in advance,
Christoph Rauch

---
Sent through Global Message Exchange - http://www.gmx.net