%chmod u+x word2txt
%word2txt [file1] > [file2]
Please write if you have questions.
######################################
#!/usr/bin/perl
while (<>) {
tr/[\0x00-\0x1F][\0xA0-\0xFF]//dc;
print;
}
######################################
Ari
On Thu, 2 Sep 1999, Marco Antonio Esteves da Rocha wrote:
> Dear all,
> Someone has collected a sizable corpus of literary works and documents
> written in Brazilian Portuguese throughout the nineteenth century. It is a
> valuable asset for us here and it is been all typed in MS Word, thus it is
> impossible to use all those software resources you all know. Does anyone
> know about a way to transform these .doc files into ASCII text files
> without having to do that one by one ? If you feel tempted to suggest
> sitting on the curb and crying, please don't.
> Marco Rocha
> marcor@cce.ufsc.br
>
>