that sounds like the correct explanation. i suspected that this might
be the case.
to amplify what paul is saying, with the example i used before:
word A other words
+---------------------
corpus 1 | 1 999
corpus 2 | 1 999999
chi^2 should not be applied since it give pretty bozoid results.
the situation that paul rayson is talking about, however, is bit more
like the following:
word A other words
+---------------------
corpus 1 | 150 1000000
corpus 2 | 1000 10000000
In this case, Pearson's chi^2 gives a score of 21.74 while the
log-likelihood ratio gives 19.40. Clearly, these are much more
comparable in this case. This shows how in certain kinds of
corpus frequency comparisons, the traditional chi^2 measure is
perfectly fine.
It should be noted that even though this level of association is
virtually impossible to have happened by chance, single cell mutual
information gives a score of only 0.52 (compared to the not terribly
exceptional case above where it gave a score of 8.97).