Re: Corpora: Collaborative effort

From: Bob Krovetz (krovetz@research.nj.nec.com)
Date: Tue Jun 13 2000 - 03:01:03 MET DST

  • Next message: COMP staff: "Re: Corpora: Collaborative effort"

    Jeremy Clear wrote:

    >... That's the crucial thing -- you spend no significant
    >time agonizing over the task; you just quickly pick some concordance
    >lines and send them in. Sure, not everyone will agree 100% that the
    >lines you've picked exactly match the sense I posted (first because
    >the sense I posted was just an arbitrary definition taken from one
    >dictionary which is clearly inadequate to define and delimit precisely
    >a semantic range; and second, because no-one is going to validate or

    Philip Resnik wrote:

    >I agree -- especially since tolerance of noise is necessary even when
    >working with purportedly "quality controlled" data. And one can
    >always post-process to clean things up if quality becomes an issue

    I don't mean to put a damper on this idea, but we should expect that
    the agreement rate will be far from 100%. Also, the tolerance of noise
    will depend on the amount of noise. I did a comparison between the
    tagging of the Brown files in Semcor and the tagging done by DSO.
    I found that the agreement rate was 56%. This is exactly the rate of
    agreement we would find by chance. So the amount of post-processing
    could be quite a bit of work!

    Bob

    krovetz@research.nj.nec.com



    This archive was generated by hypermail 2b29 : Tue Jun 13 2000 - 02:59:58 MET DST