Re: word frequency lists?

Christer Johansson (christer@sun.ling.lu.se)
Fri, 24 Nov 95 00:11:40 +0100

ted dunning wrote:
>moreover, this coin of domain specificity has another side. it also
>invalidates counts taken from the so-called "balanced" corpora [...]
>by conjoining data from diverse sources, an average count is obtained
>which might be supposed to be better in some sense than the counts
>obtained from any domain specific source.

>this is not true, however. the act of balancing has created a corpus
>which is utterly unlike any real bit of text. thus the frequency
>counts taken from such a balanced corpus cannot be taken as a
>characterization of any real text.

But isn't the point that the *difference* from "expected"("averaged") counts
can tell us something about the text?

Maybe the difference in frequency between directions is interesting.
Right and left ought to be equally possible alternatives if the starting
point (and/or goal point) is/are more or less random.
The difference in frequency between right (6866) and left (1223) *might*
perhaps indicate something about either the task or the subjects' preferences.
(Can this be an effect of right-handedness (either from the task constructors'
or the subjects' point of view) ?)

the moral?
Find the (or at least a) (subjective) point of reference!

just my two bits worth

_____________________________________________________________________________

Christer Johansson
avd. Lingvistik email: Christer.Johansson@ling.lu.se
Helgonabacken 12 alt: Christer@sun.ling.lu.se
212 12 Lund
Sweden http://www.ling.lu.se/documents/persons/Christer.html
_____________________________________________________________________________