Re: comparisons in text corpora: keywords / CHI square

Paul Rayson (paul@comp.lancs.ac.uk)
Fri, 30 Aug 1996 10:25:03 +0100 (BST)

Ted,

Of course I fully agree with you about the low (expected) frequency problems
with Chi-squared tests. In our study I implemented Log-likelihood alongside the
chi-squared value. For most of the words we were interested in (relative
frequency above 0.005%) the difference between the chi-squared value and the
log-likelihood was at most 3%. Possibly this problem didn't occur as we were
comparing roughly equal size subcorpora or the BNC.

Paul.