Corpora: CFP NTCIR WS3 (200/2001): Evaluation of IR, QA, Summarization

From: Noriko Kando (kando@nii.ac.jp)
Date: Tue Sep 25 2001 - 09:25:34 MET DST

  • Next message: Matthew Crocker: "Corpora: Positions in Language Processing"

    ... apology for duplicated post.

    The application deadline is reaching. Please register now!
    ===========================================================================

                               CALL FOR PARTICIPATION
                        The Third NTCIR Workshop (2001/2002)
            Evaluation of Information Retrieval, Q&A, and Summarization
                           September 2001 - October 2002

                   Meeting: October 8-10, 2002, NII, Tokyo Japan
                   URL: http://research.nii.ac.jp/ntcir/workshop/
                            enquiries: ntcadm@nii.ac.jp
    ============================================================================

    An evaluation workshop of Asian language text retrieval, Q&A, and text
    summarization will be held from September 2001 to October, 2002.
    Participation is invited from anyone interested in retrieval of various
    kind of text and cross-lingual information retrieval of Asian languages
    from large-scale collections, and Q&A and text summarization of Japanese

    texts.
       This year we picked five areas of research as task, Cross Language
    Retrieval, Patent Retrieval, Question Answering, Automatic Text
    Summarization, and Web Retrieval. An optional task is available in
    Patent Retrieval and Web Retrieval Tasks. Any proposal using the data
    provided are welcome for the optional task and we hope it will provide
    an exploratory occasion for new tasks.

    WORKSHOP OBJECTIVES
       * To encourage research in information retrieval, Q&A, and text
         summarization by providing reusable test collections.
       * To provide a forum for research groups interested in comparing
         results and exchanging ideas or opinions in an informal atmosphere
       * To improve the quality of the test collections based on the
         feedback from participants.

    TASK DESCRIPTION
       Below is a brief summary of the tasks envisaged for the Workshop.
    A participant will conduct one or more of the tasks or subtasks below.
    Participation in only one subtask (for example Japanese monolingual IR
    (J-J) in the CLIR Task) is available:

    1. Cross Language Retrieval Task (clir)
    Documents and topics are in four languages (Chinese, Korean, Japanese
    and English)
       * Multilingual CLIR (MLIR): Search document collection more than one
         languages by one of four languages of topics.excepting Korean
         documents.
       * Bilingual CLIR (BLIR): Search of any two different languages as
         language and documents, excepting search of English documents
       * Single Lanugage IR (SLIR): Monolingual Search of Chinese, Korea, or

         Japanese.
    DOCUMENT: newspapers publish in Asia:
    - Chinese: CIRB010, United Daily News (1998-1999)
    - Korean: Korea Economic Daily (1994)
    - Japanese: Mainichi Newspaper (1998-1999)*
    - English: Taiwan News and China English News (1998-1999),
       Mainichi Daily News (1998-1999)*

    2. Patent Retrieval Task (patent)
       * Main Task
            o Cross-language Cross-DB retrieval: retrieve patents in
              response to J/E/C newspaper articles associated with
              technology and commercial products.
            o Monolingual Associative Retrieval: retrieve patents associated

              with an input Japanese patent
       * Optional task: Any research reports are invited on patent
         processing using the above data, including, but not limited to:
         generating patent maps, paraphrasing claims, aligning claims
         and examples, summarization for patents, clustering patents.
    DOCUMENT: - Japanese patents: 1998-1999 (about 17GB)
    - Japio patent abstracts: 1995-1999
    - Patent Abstracts of Japan (English translations for
       Japio patent abstracts): 1995-1999
    - Patolis test collection (34 topics and relevance assessment)
    - Newspaper articles (Japanese/English/Traditional Chinese)

    3. Question Answering Task (qac)
       * Task 1: System extracts five answers from the documents in some
         order. 100 questions. System is required to return support
         information for each answer of the questions. We assume
         the support information as a paragraph, 100 letter passage or
         document which includes the answer.
       * Task 2: System extracts only one answer from the documents. 100
         questions. Support information is required.
       * Task 3: evaluation of a series of questions. The related questions
         are given for the 30 of questions of Task 2.
    DOCUMENT: Japanese newspaper articles (Mainichi Newspaper 1998-1999)*

    4. Automatic Text Summarization Task (tsc2)
       * Task A (single document summarization): Given the texts to be
         summarized and summarization lengthes, the participants submit
         summaries for each text in plain text format.
       * Task B (multi-document summarization): Given a set of texts, the
         participants produce summaries of it in plain text format. The
         information which was used to produce the document set, such as
         queries, as well as summarization lengthes are given to the
         participants.
    DOCUMENT: Japanese newspaper articles (Mainichi Newspaper 1998-1999)*

    5. Web Retrieval Task
       * A. Survey Retrieval: Survey retrieval is similar to the
         traditional Ad-hoc retrieval for scientific documents or
         newspapers, where the system performs searching using newly
         provided topics for a static document set. Both recall and
         precision are evenly weighted for the evaluation. Two
         types of subtasks are provided: the retrieval using the
         topics in the almost same format of the past NTCIR workshops
         ('A1. Topic Retrieval') and the one using relevant documents
         given ('A2. Similarity Retrieval'). The page is the basic
         unit for evaluation, however, evidential passages can be
         used for complementary evaluation. Here the evidential
         passages means a part of each relevant document that gives
         the evidence of relevance judgment, and the submission
         of them is not mandatory.

       * B. Target Retrieval: Target retrieval is to try to evaluate
         the effectiveness of the retrieval in the case the user
         requires just one answer or a few (e.g. a fact-type
         retrieval, a reteieval of a site top page), where precision
         should be emphasized. The runs will be submitted as the
         ranked top 10 documents retrieved for each topic, being
         attached with evidential passages (not mandatory). Several
         evaluation measures will be applied.

       * C. Optional Tasks: The participants can freely subscribe
         proposals using the document set used in sub-task A and B,
         according to their own research interests. The results are
         presented as the paper/poster in the NTCIR-3 workshop
         meeting. If the proposal can involve several participants,
         it can be adopted as a sub-task and investigated in the
         details. 'C1. Search results classification' and 'C2.
         Speech-Driven Retrieval' are examples of the optinal
         tasks.A. Survey Retrieval (both recall and precision are
         evaluated)
    DOCUMENT: Web documents mainly collected from jp domain (ca.100GB &
              ca.10GB) Available at the "Open-Lab" in the NII

    WORKSHOP SCHEDULE
    2001-09-30 Application Due
    2001-10-01 Document release (newspaper)
    2001-10/2002-01 Dry Run and Round-Table Discussion
                         (depends on each task)
    2001-12 Open Lab start
    2001-12/2002-03 Formal Run (depends on each task)
    2002-07-01 Evaluation Results Delivery
    2002-08-20 Paper for Working Note Due
    2002-10-08/10 NCIR Workshop 3 Meeting
                 Days 1-2: Closed session (task participants only)
                 Day 3: Open session
    2002-12-01 Paper for Final Proceedings Due

    TYPES OF PATICIPATION
       * A. FULL: Submit results and describe the system. The
         correspondence between the group name and the group ID will
         be announced.
       * B. ANONYMOUS: Submit results. The details of the system may not be
         reported. The correspondence between the group name and the group
         ID is not announced. This category is mainly for the participants
         from the companies who have troubles to report the details.

    The list of the participating groups will be made public although the
    evaluation results will be announced using the group IDs only. Whichever

    of the types of participation, every participating group must submit
    (1) paper(s) for the workshop proceedings, (2) a system description
    form which describes your system, and (3) bibliographic references and
    a copy of all your papers when you will publish a paper using NTCIR
    test collections.

    APPLICATIONS
    Online application;
    http://research.nii.ac.jp/ntcir/workshop/application-en.html

    ENQUIRIES
       * Please send email to Noriko Kando, program chair or to
         NTCIR Project administrators (ntcadm@nii.ac.jp).
       * For the details of a specific task, please contact each task's
         chair and organizers.

    NEW FEATURES of NTCIR WS3 TASKS
         * Two Types of CLIR
           (1) Multilingual CLIR of Asian Languages and English (CLIR)
           (2) CLIR of Technical Information: Search Japanese Patent
               documents by English/Chinese/Japanese topics.
               English-Japanese paired abstracts (ca. 1,500,000 docs)
                are included in the test collection used for NTCIR WS3.
         * Optional Tasks (Patent & Web): any research groups who are
            interested in the research using the document collection
            provided in these tasks for any research projects are invited!
            Also we expect that this venture will explore the new
            possible tasks for the future NTCIR workshop.
         * Search by Documents (Patent & Web)
         * Passage Retrieval (Patent, QA & Web)
         * Precision-oriented Evaluation (QA & Web) and Multigrade Relevance

            Judgments (CLIR, Patent & Web)

    NOTES
       * The proceedings will be published online as well as printed-form.
       * Dissemination of the research results using the NTCIR collections
         other than in the Workshop's Proceedings is welcome. However, the
         conditions of participation preclude specific advertising claims
         based on the results using the Collection or the Workshop.
       * International participants are welcome. Announcements will be in
         English and Japanese.
       * The official language for the proceedings papers and presentation
         at the Workshop meeting in October, 2002 is English.
       * Documents will be provided for the participants those who returned
         required user agreement forms.
       * DOCUMENT USAGE: The period of permitted use of Mainichi Newspapers
         and Mainichi Daily News are from 2001-09-01 to 2003-09-30. For
         active participants who submit the results and who affiliated at
         the organization outside Japan will be able to extend the period
         up to 2008-09-30. After the permitted period will be terminated,
         the participants will have to delete all the document data. Those
         who want to use the data after the period can purchase the data
         from Mainichi Newspaper Co., and obtain the permission for
         research purpose use from the company. The permitted period
         may vary according to each task.
    -----------------------------------------------------------------------------

    Noriko Kando.
    ntcir project



    This archive was generated by hypermail 2b29 : Tue Sep 25 2001 - 10:15:49 MET DST