I have been reviewing some of the similarity measures used to perform word
clustering (Jaccard, Dice, Simple Matching, correlation, etc.) and I came
to the conclusion that many of those measures had some metric problems that
probably make them non optimal for word clustering.
I am working now on some modified versions of those indices and I need some
ways to benchmark those new similarity measures. I would like to have a
series of benchmarks for several kinds of application (dimension reduction,
automatic identification of themes, automatic taxonomy development, etc.).
I would like suggestions for ways to benchmark those new measures and
compare their performance with the more traditional ones. Any idea,
reference, data set would be welcome.
I am also looking for existing articles where those measures have been
compared (either empirically or theoretically)
Thanks,
Normand Peladeau
Provalis Research
This archive was generated by hypermail 2b29 : Fri Oct 08 2004 - 21:16:21 MET DST