Re: Corpora: Statistical significance of tagging differences

James L. Fidelholtz (jfidel@siu.buap.mx)
Fri, 19 Mar 1999 16:27:34 -0600 (CST)

On Fri, 19 Mar 1999, Paul Rayson wrote:
...
>Can you direct me to a book or article which says chi-square is designed for
>small numbers?

Paul,
I'm not specifically familiar with the book that Ted mentions,
but Cressie is certainly a well-known name. Anyway, the deal with chi
square is that (if memory serves--it's been some years since I actually
applied a chi squared test) you take the average value for, say, a
column in a table, and then subtract each of the values from the mean
value for the column (or maybe subtract the square from the square of
the mean, and then take the square root). You then do the same for each
column, and perform some operation which I forget to the resulting rows.
Chi square is then the SUM of all these values, whatever they are, and
obviously if the column is anything other than the same value repeated
over and over, each value diferent from the mean will produce a positive
contribution to chi squared. SO, if you have bigger but nonequal
entries you will get a bigger result. Chi square is 'significant'
beyond a certain positive value, and if the numbers are big enough, it
is almost guaranteed to be significant.
By the way, note that ANOVA and many other more complicated
statistical procedures are usually independent of the size of the
sample. Basically, it's a question of what you divide by, if anything.
Anyhoo, and even though I and the psychologists use an
occasional statistical test so we don't look 'unscientific' to the
NONcognoscenti, I always maintain that if the table itself doesn't SEEM
at a glance to show significant results, statistical tests are not
likely to be much help. This is of course much less true for
complicated interactional tests like ANOVA, etc., which can indeed show
you unexpected dependencies within your data (always, of course,
assuming that you arranged your data in an intelligent fashion in the
first place). The short caveat, then, is that statistical tests are no
substitute for some initial thought processes.
If Ted's and others' suggestions don't fill the bill for you,
email me back. I have a few good books on statistics (esp. for social
scientists), and I'll be glad to look up the references for you at home.
Jim

James L. Fidelholtz e-mail: jfidel@siu.buap.mx
Maestri'a en Ciencias del Lenguaje
Instituto de Ciencias Sociales y Humanidades
Beneme'rita Universidad Auto'noma de Puebla, ME'XICO