Corpora: Performance measures of text categorization

From: Fuchun Peng (f3peng@ai.uwaterloo.ca)
Date: Mon Mar 04 2002 - 23:32:59 MET

  • Next message: edilog@ed.ac.uk: "Corpora: EDILOG 2002 First call for papers"

    Dear List members:

    I have a question about the performance measures of the text
    categorization.

    The standard performance measure in text categorization is the breakeven
    point, which is defined as the point where the precion equals the
    recall. The reason for doing this to balance the precision
    and recall. But such a point normally does not exist in experiments. So
    people have to use interpolation (or extrapolation) to get this point from
    the precision-recall curve.

    In IR community, people often use the F-measure to balance to precision
    and recall. F-measure is defined as
    "2*precision*recall/(precision+recall)".

    The computation of the breakeven point (interpolation) is more
    difficult than computing the F-measure (simple formula). So I do not see
    any advantages of the breakeven point measure over the F-measure. One
    reason for people to keep using the breakeven point measure maybe becauese
    they have to compare their results with previous researchers, who measured the
    performance with the breakeven point. But beside this, does anybody know
    any arguments why the breakeven point instead of the F-measure should be
    used in text categorization?

    Best regards

    Fuchun

    --------------------------------------------------------
     Fuchun Peng PhD candidate
     Computer Science Department, University of Waterloo
     Waterloo, Ontario, Canada, N2L 3G1
     1-519-888-4567 ext 5392 f3peng@ai.uwaterloo.ca
     http://ai.uwaterloo.ca/~f3peng



    This archive was generated by hypermail 2b29 : Tue Mar 05 2002 - 15:30:33 MET