Corpora: Performance measures of text categorization

From: Fuchun Peng (f3peng@ai.uwaterloo.ca)
Date: Mon Mar 04 2002 - 23:32:59 MET

Next message: edilog@ed.ac.uk: "Corpora: EDILOG 2002 First call for papers"

Previous message: Fabio Tamburini: "Corpora: CORIS/CODIS available on-line"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Dear List members:

I have a question about the performance measures of the text
categorization.

The standard performance measure in text categorization is the breakeven
point, which is defined as the point where the precion equals the
recall. The reason for doing this to balance the precision
and recall. But such a point normally does not exist in experiments. So
people have to use interpolation (or extrapolation) to get this point from
the precision-recall curve.

In IR community, people often use the F-measure to balance to precision
and recall. F-measure is defined as
"2*precision*recall/(precision+recall)".

The computation of the breakeven point (interpolation) is more
difficult than computing the F-measure (simple formula). So I do not see
any advantages of the breakeven point measure over the F-measure. One
reason for people to keep using the breakeven point measure maybe becauese
they have to compare their results with previous researchers, who measured the
performance with the breakeven point. But beside this, does anybody know
any arguments why the breakeven point instead of the F-measure should be
used in text categorization?

Best regards

Fuchun

--------------------------------------------------------
Fuchun Peng PhD candidate
Computer Science Department, University of Waterloo
Waterloo, Ontario, Canada, N2L 3G1
1-519-888-4567 ext 5392 f3peng@ai.uwaterloo.ca
http://ai.uwaterloo.ca/~f3peng

Next message: edilog@ed.ac.uk: "Corpora: EDILOG 2002 First call for papers"
Previous message: Fabio Tamburini: "Corpora: CORIS/CODIS available on-line"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Tue Mar 05 2002 - 15:30:33 MET