Re: Corpora: parser recommendation

From: Miles Osborne (osborne@cogsci.ed.ac.uk)
Date: Mon Jan 22 2001 - 18:04:13 MET

  • Next message: Vangehuchten Lieve: "Corpora: tagger/parser for Spanish"

    yes, i agree: when making claims that approach X is better than approach
    Y, people really ought to also consider training/testing on material other
    than just wsj / bnc etc. the key ideas here are bias and variance: if an
    approach (eg neural nets, EM etc) has a high bias and variance (will give
    different results if either parameters are varied or the training set
    varies) then any results reported using a single distribution won't
    necessarily hold in some other scenario. or, just because your parser is
    good at wsj doesn't necessarily mean that it will be good at susanne.

    here's an example paper that, at least in my opinion, takes empirical
    evaluation seriously:

    http://robotics.stanford.edu/~ronnyk/vote.ps.gz

    Bauer, Eric, Kohavi Ron, An Empirical Comparison of Voting Classification
    Algorithms: Bagging, Boosting, and Variants. To appear in the journal
    Machine Learning Vol 36, Nos. 1/2, July/August 1999, pages 105-139
    compressed postscript (632K) updated 5/22 /99 or acrobat (PDF).

    also check the "free lunch theorem" -can't find a link off-hand.

    Miles Osborne



    This archive was generated by hypermail 2b29 : Mon Jan 22 2001 - 18:00:45 MET