Here is a UNIX script:
% sort one | uniq > one.uniq
% sort two | uniq > two.uniq
% cat one.uniq one.uniq two.uniq | sort | uniq -c | sort -nr > output
Here is an example
one:
==========
cat
dog
cat
mouse
two:
==========
cat
rabbit
elephant
rabbit
output:
==========
3 cat
2 mouse
2 dog
1 rabbit
1 elephant
Words with a count of 3 appear in both "one" and "two".
Words with a count of 2 appear in "one" only.
Words with a count of 1 appear in "two" only.
-- DragoMiles Osborne wrote: > > that's far too slow -use a hash table instead. > > now, this wouldn't be homework, would it? > > Miles > > Quoting Otto Lassen <otto@lassen.mail.dk>: > > > Hi > > That could be done in any language: > > 1. sort then two lists > > 2. compare them word for word > > 3. output words which are not in both lists > > Regards > > Otto Lassen > > > > At 21:54 15-11-2003 +0100, you wrote: > > >Hi, > > > > > >I'm doing a project that involves comparing two very large word lists > > > > >(~40.000 and 70.000 words). What I need to find out, is which words are > > on > > >one list and not on the other (and/or vice versa). > > >Can anyone give me a hint as to how to do this? (I was thinking; maybe > > a > > >perl script?) > > > > > >Any help will be greatly appreciated. > > >Best, > > >Tine Lassen > > > > > >
-- Dragomir R. Radev radev@umich.edu Assistant Professor of Information, Electrical Engineering and Computer Science, and Linguistics, the University of Michigan, Ann Arbor Phone: 734-615-5225 Fax: 734-764-2475 http://www.si.umich.edu/~radev
This archive was generated by hypermail 2b29 : Sat Nov 15 2003 - 23:14:17 MET