[Corpora-List] Developing and testing new similarity measures for word clustering

From: Normand Peladeau (peladeau@simstat.com)
Date: Fri Oct 08 2004 - 14:47:02 MET DST

Next message: Mark P. Line: "Re: [Corpora-List] Developing and testing new similarity measures for word clustering"

Previous message: Jesus Angel Gimenez Linares: "[Corpora-List] SVMTool v1.2.1"
Next in thread: Mark P. Line: "Re: [Corpora-List] Developing and testing new similarity measures for word clustering"
Reply: Mark P. Line: "Re: [Corpora-List] Developing and testing new similarity measures for word clustering"
Reply: Dinoj Surendran: "Re: [Corpora-List] Developing and testing new similarity measures for word clustering"
Reply: Adam Kilgarriff: "RE: [Corpora-List] Developing and testing new similarity measures for word clustering"
Reply: Eric Atwell: "Re: [Corpora-List] Developing and testing new similarity measures for word clustering"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I have been reviewing some of the similarity measures used to perform word
clustering (Jaccard, Dice, Simple Matching, correlation, etc.) and I came
to the conclusion that many of those measures had some metric problems that
probably make them non optimal for word clustering.

I am working now on some modified versions of those indices and I need some
ways to benchmark those new similarity measures. I would like to have a
series of benchmarks for several kinds of application (dimension reduction,
automatic identification of themes, automatic taxonomy development, etc.).

I would like suggestions for ways to benchmark those new measures and
compare their performance with the more traditional ones. Any idea,
reference, data set would be welcome.

I am also looking for existing articles where those measures have been
compared (either empirically or theoretically)

Thanks,

Normand Peladeau
Provalis Research

Next message: Mark P. Line: "Re: [Corpora-List] Developing and testing new similarity measures for word clustering"
Previous message: Jesus Angel Gimenez Linares: "[Corpora-List] SVMTool v1.2.1"
Next in thread: Mark P. Line: "Re: [Corpora-List] Developing and testing new similarity measures for word clustering"
Reply: Mark P. Line: "Re: [Corpora-List] Developing and testing new similarity measures for word clustering"
Reply: Dinoj Surendran: "Re: [Corpora-List] Developing and testing new similarity measures for word clustering"
Reply: Adam Kilgarriff: "RE: [Corpora-List] Developing and testing new similarity measures for word clustering"
Reply: Eric Atwell: "Re: [Corpora-List] Developing and testing new similarity measures for word clustering"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Fri Oct 08 2004 - 21:16:21 MET DST