Re: Corpora: Collaborative effort

From: Jem Clear (jem@cobuild.collins.co.uk)
Date: Sun Jun 11 2000 - 12:08:42 MET DST

  • Next message: Jem Clear: "Re: Corpora: Collaborative effort"

    George

    Thanks for your comprehensive comments and suggestions. Here are
    some
    rejoinders:

    a) Of course the free, unrestricted distribution of the resulting
    collection of citations grouped by word/sense category is ESSENTIAL.
    That's the key point of the idea. Even well-meaning research projects
    carried out with public funding often yield results which it is
    difficult to obtain in full becuase of the commercial sensitivities
    of one or two of the commercial participants in the project
    consortium and even (dare I say) because many universities these
    days see themselves in competition with other research and learning
    institutions and are sometime reluctant to give away data like this.

    b)
    > When you post a word, list ALL of its senses and indicate which one
    > you want to get... In fact, it does not seem so useful to get just
    > one sense. Why not give the word and all of its senses? Let the
    > participants sort the examples into sense1, sense2 etc.

    Yes. But I really think the collaborative nature of the idea will
    *only* work if the amount of effort required by any individual is
    minimal. Once you present, say, a word like "run" and offer 26
    different definitions in one block and ask people to submit citations
    for all 26 categories then you are really asking people to commit a
    significant amount of work to analyse the subtle variations in sense
    distinction, and sort through potentially hundreds of thousands of
    examples to pick out instances of each sense. My idea was that if you
    see a word + defintion pair you can (without thinking too hard about
    it) pick from a corpus a few examples which seem, prima facie, to fit
    the selected sense. We can worry about the fine distinctions, and
    overlapping sense categories later!

    c)
    > One of current
    > interest to me (Hint: Please use this one :-) ) is "today".
    > Today1 = (N) The day of the utterance. Today is June 9.
    > Today2 = (N) The current time period. Today's man is always busy.
    > Today3 = (ADV) happening on the current day. I went to the store today.
    > Today4 = (ADV) happening in the current period, nowadays.
    > Today, we use computers to communicate.

    If this idea were to work we cannot spend time and effort arguing over
    the sense categories themselves. I personally think that in your
    example above "today1" and "today3" are identical in meaning -- but
    that's no problem for this collaborative venture as long as we don't
    expect to get a database which covers *all* the sense distinctions
    everyone would like to make.

    Cheers

    Jem

    PS Thanks for the examples for "fierce"



    This archive was generated by hypermail 2b29 : Sun Jun 11 2000 - 12:15:21 MET DST