[Corpora-List] Developing and testing new similarity measures for word clustering

From: Normand Peladeau (peladeau@simstat.com)
Date: Fri Oct 08 2004 - 14:47:02 MET DST

  • Next message: Mark P. Line: "Re: [Corpora-List] Developing and testing new similarity measures for word clustering"

    I have been reviewing some of the similarity measures used to perform word
    clustering (Jaccard, Dice, Simple Matching, correlation, etc.) and I came
    to the conclusion that many of those measures had some metric problems that
    probably make them non optimal for word clustering.

    I am working now on some modified versions of those indices and I need some
    ways to benchmark those new similarity measures. I would like to have a
    series of benchmarks for several kinds of application (dimension reduction,
    automatic identification of themes, automatic taxonomy development, etc.).

    I would like suggestions for ways to benchmark those new measures and
    compare their performance with the more traditional ones. Any idea,
    reference, data set would be welcome.

    I am also looking for existing articles where those measures have been
    compared (either empirically or theoretically)

    Thanks,

    Normand Peladeau
    Provalis Research



    This archive was generated by hypermail 2b29 : Fri Oct 08 2004 - 21:16:21 MET DST