There were a few typos in the formula.
Here is the correct formula:
D=[log(sumpi)-(sumpi*logpi)/sumpi]/logn
where
n = the number of categories
i = the category number, 1,2,3...n
pi = the probability of a token in the ith category, and pilogpi=0 for pi=0
So my question again - Is it essential that the token sizes are the same for
all categories?
James Purchase
Language Centre
HKUST