scaling/norming

lcjohn@usthk.ust.hk ("lcjohn@usthk.ust.hk")
Thu, 30 Nov 1995 23:11:00 +0800

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Dan Melamed: "Re: scaling/norming"
Previous message: lcjohn@usthk.ust.hk: "corpus of examination scripts in English??"
Next in thread: Dan Melamed: "Re: scaling/norming"

What's the doctrine on comparing corpora of different sizes?

I want to compare features (wds, n-grams, POS tags etc) from a corpus of .5
mil words of the writing of NS speakers of English to a 750,000 wd corpus of
the writing of NNS speakers. I've been told that proportional or scaled
comparisons is inadvisable (presumably since wd freqs can't be predicted
proportionally (because of Zipf's curve??). Am I left with no alternative
but to throw away materials from the smaller corpus?

John Milton
HKUST

Next message: Dan Melamed: "Re: scaling/norming"
Previous message: lcjohn@usthk.ust.hk: "corpus of examination scripts in English??"
Next in thread: Dan Melamed: "Re: scaling/norming"