Re: word frequency lists?

Ted Pedersen (
Sun, 26 Nov 1995 09:37:33 -0600 (CST)

A small point in favor of frequency lists...

In trying to come up with frequency lists for bigrams and trigrams I
find that when the corpus size hits 100,000 words I run out of memory
on the computer. While I might be able to tweak my program and get
that number up to 200,000 or maybe 500,000 (doubt it) I think the
system limitations here will prevent me from coming up with bigram
and trigram counts for a 1,000,000 word corpus.

So...if someone with much greater computing resources than I has come
up with bigram and trigram frequency lists I'd love to hear about
it. It would be ideal if such counts were available for the ACL/DCI
WSJ corpus as that is the corpus I've been working with.


* Ted Pedersen                         *
*                         *
* Department of Computer Science and Engineering,                     *
* Southern Methodist University, Dallas, TX 75275      (214) 768-3712 *