RE: Corpora: Unpacking BNC with WinZip

Knut Hofland (Knut.Hofland@hit.uib.no)
Fri, 1 Jan 1999 22:54:08 +0100 (MET)

On Fri, 1 Jan 1999, Christopher Tribble wrote:

> The problem I keep coming back to is that the contents of A.TGZ, B.TGZ and
> C.TGZ on the BNC distribution CD ROMS all appear as large single files to
> WinZip, while all the other TGZ distribution files show unpack as TAR
> archives containing _many_ files - which is what I want.

When I extracted some files from the BNC some time ago, I used a tar
program I found on the net.

Good sources for such searches are:

http://garbo.uwasa.fi/cgi-bin/vsl-front/QuickForm
http://ftpsearch.lycos.com/

The tar program had an option for gzipped tar files, I did not have to
load the gunzipped tar file on my hard-disk (which in these days was
rather small). The program did not have an option for changing Unix LF to
MS-DOS CR/LF, but I wrote a small program to do this. The tar program
works on the BNC a.tgz file (at least it does not give any error message
and the files seems OK).

Tar is used as follows (. is the switch for gzipped files):

tar -.xvf F:a.tgz A/A0/A08 to extract just file A08
tar -.xvf F:a.tgz A/A0/ to extract all files in A/A0

These programs can be found at:

http://www.hit.uib.no/files/tar.exe
http://www.hit.uib.no/files/lf2crlf.exe

Knut Hofland | Knut.Hofland@hit.uib.no
HIT-Centre (former NCCH) | http://www.hit.uib.no/knut/
University of Bergen, | Phone: +47 5558 9463
Allegt. 27, N-5007 Bergen, Norway | Fax: +47 5558 9470