Re: equal corpora sizes??

James Purchase, Language Centre, Tel:358 7862 (LCGUIBI@usthk.ust.hk)
Fri, 19 Apr 1996 09:35:01 +0800

Re: D, the index of dispersion

There were a few typos in the formula.

Here is the correct formula:

D=[log(sumpi)-(sumpi*logpi)/sumpi]/logn

where

n = the number of categories
i = the category number, 1,2,3...n
pi = the probability of a token in the ith category, and pilogpi=0 for pi=0

So my question again - Is it essential that the token sizes are the same for
all categories?

James Purchase
Language Centre
HKUST