<pre>
Dear corporal mates,
I am in an acute need for a simple program (dos, Windows, Unix) that would
provide me with cumulative numbers of different words (types) as it skims
through a text word by word. In other words, the program should print out a
number for each word but increase the number only when a new type is
encountered. The output would be something like that:
1
2
3
4
4
5
6
6
6
...
Probably I could write this kind of program myself, but I do not have time
or ardour to reinvent the wheel. Maybe a simple Perl script would do the
trick? Thank you in advance for your support.
yours,
sampo
</pre>
How about this:
---------------
#!usr/bin/perl
$countDifferent=0;
open (IN, "</path/to/file") || die "can't open the file!";
while (<IN>) {
$line= $_;
@words = split(/\s/, $line);
foreach $word (@words) {
if (!$words{$word}) {
$countDifferent++;
$words{$word} = 1;
}
print "$countDifferent\n";
}
}
close (IN);
exit(0);
---------------
It's primitive - but does what you want.
It assumes that you are interested in orthographic words and that there is
always one whitespace between words.
Best,
Sebastian
--Sebastian Hoffmann Englisches Seminar der Univ. Zürich Plattenstrasse 47 CH-8032 Zürich Tel: +41-1-634 3551 Fax: +41-1-634 4908
This archive was generated by hypermail 2b29 : Thu May 30 2002 - 09:52:12 MET DST