Re: Chi-Square

george@scs.leeds.ac.uk
Thu, 13 Mar 1997 14:23:53 GMT

>
> Dear List Members,
>
> I am currently working on the question of whether there is a
> correlation between certain grammatical choices concerning English
> verbs and the lexical environment they are in. Let me briefly
> illustrate this: it can be shown from corpus data that the verb
> 'bring' in collocation with the noun 'charges' as object occurs in
> the passive far more often than it does on average without looking
> at any particular collocation.
>
> In order to show correlations of this type on a statistically sound
> basis I thought that the chi-square test might be appropriate. I
> would be grateful for any hints as to the appropriateness of this
> test for my purposes. Any information about literature describing
> the use of the chi-square test in connection with lexical
> distributions would be most welcome, as well.
>
> thanks
> Mike

I was surprised to see that only a few people have paid attention to the
nature of statistical tests both last year (when the
subject was discussed on the same list) and now, and there seems to be
a widespread misconception about the proper interpretation of the results
of such tests. It is known from classical statistics that tests like the
chi-square or g2 and the rest are based on differences between data and can
only reveal whether a difference between two sets of data exists. Adam Kilgariff
quite rightly wrote that

"...Chi-square is a test for saying
whether we can be confident that something is non-random. Most if not
all language phenomena are non-random, so, unless you are very short
on data, chi-square will confirm that we can be confident that the
association is non-random -- but that is not of great interest."

However, with all respect to other people's work (Ted Dunning, Kilgariff and
others) which highlighted some of the problems of such tests,
alternative tests like the g2 or Mann-Whitney do not escape the
same trap. Yes, it may be true that the g2 test does not overestimate the
differences as much as the chi-square test for small samples, but this is not a
reason to argue that the g2 test can be safely used to assess the
association between two variables. It can only indicate if a difference exists
and taken into account that nothing is completely independent of
anything else in nature, some differences will always exist and given a large
enough sample, a significant result may be always obtained. The same is true,
I repeat, for other tests including the Mann-Whitney test as well as the
family of parametric tests.

When one wants to assess the degree of association between two variables, a more safe bet
would be to use an association coefficient which is not affected by sample size.
A number of such coefficients exist (see for example the phi, Cramer's phi
lamda, uncertainty coefficient etc.) and can be fairly easily used in statistical
packages like the SPSS.

To give an example based on Mike's problem, the chi-square or g2 tests can provide
evidence that "charges" occurs after "bring" more often than random but they cannot
tell us how strongly related the two words are even though the result may be significant
at 0.001 level. Using the uncertainty coefficient (you may use other some other coefficient
depending on the problem), one could find an estimate of the relative reduction of
uncertainty for predicting that given the word "bring" the word "charges" will follow
(by comparing it with the uncertainty in predicting that "charges" will follow when
we do not know the previous word). This is the same If the value of the uncertainty
coefficient is close to zero then it seems that no significant correlation exists. If,
on the other hand, the value of the uncertainty coefficient is close to 1 there is an
indication that a strong association (collocation) has been found.

To cut a long story short, you can use the chi-square, g2, Mann-Whitney, t-test or whatever
to identify differences, but be aware that this does not tell anything about the
strength of association between the variables and the result may have been affected by
the sample size (producing a type I or type II error).
Given that enough experimental data is used, if evidence about an association between the
variables by the application of such a test has been found, a more safe bet would be to go
on and try to assess the strength of this association. If the result still indicates that an
association exists then you have found something significant. Else, your
test has probably overestimated the differences and it's up to you to try to convince
others that the particular test you have used was appropriate for the task and the results are
really significant.

============================================================================
George C. Demetriou
Centre for Computer Analysis of Language And Speech (CCALAS)
& Artificial Intelligence Division, School of Computer Studies

phone: +44 1132 336827 Leeds University
FAX: +44 1132 335468 Leeds LS2 9JT
Email: george@scs.leeds.ac.uk United Kingdom
============================================================================

P.S. Information about the use of association coefficients can be found in many books of statistics.
Hays' "Statistics for the Social Sciences" gives enough information about the differences between
a test like the chi-square and an association coefficient (such information can be found in many other
books as well as the reference manuals of statistical packages). E-mail me on more on this if needed.