Re: word frequency lists?

Ted Pedersen (pedersen@seas.smu.edu)
Sun, 26 Nov 1995 09:37:33 -0600 (CST)

A small point in favor of frequency lists...

In trying to come up with frequency lists for bigrams and trigrams I
find that when the corpus size hits 100,000 words I run out of memory
on the computer. While I might be able to tweak my program and get
that number up to 200,000 or maybe 500,000 (doubt it) I think the
system limitations here will prevent me from coming up with bigram
and trigram counts for a 1,000,000 word corpus.

So...if someone with much greater computing resources than I has come
up with bigram and trigram frequency lists I'd love to hear about
it. It would be ideal if such counts were available for the ACL/DCI
WSJ corpus as that is the corpus I've been working with.

Regards
Ted

--
* Ted Pedersen                     pedersen@seas.smu.edu              *
*                                  http://www.seas.smu.edu/~pedersen/ *
* Department of Computer Science and Engineering,                     *
* Southern Methodist University, Dallas, TX 75275      (214) 768-3712 *