Corpora: statistical tests

Marc Light (light@linus.mitre.org)
Wed, 17 Mar 1999 11:13:47 -0500 (EST)

Hi Mark,

We've been working on a similar problem: we have different versions of
a system that take reading comprehension exams (4th grade) and we want
to know if the differences in performance of the different
configurations are significant.

We've been working with the method mentioned and used in MUC3:

Nancy Chinchor, Lynette Hirschman, and David D. Lewis
"Evaluating Message Understanding Systems: An analysis of MUC3"
Computation Linguistics Volume 19, No. 3, 1993

It involves an approximate randomization test. As I understand it,
one advantage of this method is that it does not assume a distribution
(e.g., normal) of the values of the statistic used to measure system
performance.

A more complete description is given in

Computer Intensive Methods for Testing Hypotheses: An Introduction
Eric W. Noreen
John Wiley and Sons
1989
ISBN 0-471-61136-0

which I have not read yet but plan to shortly!

I would very much welcome a discussion of the topic of significance
testing, since I personally find it to be crucial yet slippery.

Marc