[Corpora-List] ACL-2003 Workshop CFP: Workshop on Patent Corpus Processing

From: Priscilla Rasmussen (rasmusse@cs.rutgers.edu)
Date: Wed Mar 26 2003 - 00:03:18 MET

  • Next message: Priscilla Rasmussen: "[Corpora-List] 8th International Workshop on Parsing Technologies (IWPT 2003): Call for Participation"

    *** Apologies for multiple copies ***

    ACL 2003 Workshop on Patent Corpus Processing
    12 July 2003, Sapporo, Japan

    CALL FOR PAPERS
    http://www.slis.tsukuba.ac.jp/~fujii/acl2003ws.html

    =======================
    Workshop Description
    =======================

    The goal of this workshop is to foster research and development of the
    technology for patent corpus processing, by providing a forum in which
    researchers and practitioners can exchange and share their ideas,
    approaches, perspectives, and experiences from their work in progress.

    The processing of intellectual property (IP) documents, including
    patents, is important in the scientific, business, and law
    communities. Much of the focus for patent and IP processing has been
    in the database and information retrieval communities, but not in the
    computational linguistics (CL) and natural language processing (NLP)
    communities.

    In 2000, the first ACM SIGIR 2000 Workshop on Patent Retrieval was
    held. In this workshop, patent retrieval systems in use at EPO
    (European Patent Office) and JAPIO (Japanese Patent Information
    Organization) were introduced, and a number of issues related to
    patent retrieval (e.g., producing ontologies, cross-language
    retrieval, and evaluation methods) were proposed/discussed.

    In 2001-2002, the NTCIR workshop (the National Institute of
    Informatics, Japan), which is a TREC-style evaluation forum for
    research and development on IR/NLP, first performed the patent
    retrieval task. Two years of Japanese patents (approximately 7M
    documents published in 1998-1999; 18GB) were used to evaluate
    mono/cross-lingual patent retrieval systems. In addition,
    approximately 17M Japanese/English parallel patent abstracts were used
    to evaluate the effectiveness of extracting translation lexicons.

    =======================
    Areas of Interest
    =======================

    Patent corpora are associated with a number of interesting
    characteristics, for which various CL/NLP techniques have promise for
    improving the quality of patent processing.

    * multilinguality: the same/similar contents (i.e., inventions) are
    filed in different languages, for which machine translation,
    cross/multi-lingual retrieval, and translation extraction alleviate
    problems in accessing information in foreign languages.

    * scalability: a huge amount of copora data is available and periodically
    produced, for which text summarization and natural language generation
    help produce understandable coherent condensed contents.

    * complexity: since patents consist of overwhelmingly long sentences,
    parsing/chunking techniques help produce readable shorter fragments.

    * classification: patents are manually categorized based on a specific
    classification system, such as IPC (international patent
    classification), which can be used for statistical classification
    methods.

    * novelty/temprality/dynamism: new terms and concepts associated with
    inventions are periodically created, for which term extraction and
    ontology construction techniques help update lexical resources for
    patent processing.

    * document structures: unlike newspaper articles, patents are
    structured with a number of specific fields (e.g., titles, abstracts,
    and claims). While conventional text segmentation techniques rely
    mainly on linguistic contents (e.g., lexical chains), structure
    analysis techniques (e.g., ones related to XML) are also crucial in
    the context of CL/NLP.

    * applications: the above techniques can directly contribute to a
    number of applications, such as patent retrieval systems.

    We invite both research papers and project papers associated with, but
    not limited to, the rudiments of patent corpus processing listed
    above. We also invite papers addressing applications and user
    studies.

    =======================
    Important Dates
    =======================

    Submission deadline: 10 April 2003
    Acceptance notification: 12 May 2003
    Final version deadline: 30 May 2003
    Workshop date: 12 July 2003

    =======================
    Workshop Chairs
    =======================

    Makoto Iwayama, Tokyo Institute of Technology / Hitachi Ltd., Japan
    Atsushi Fujii, University of Tsukuba, Japan

    =======================
    Contact Information
    =======================

    Atsushi Fujii, fujii@slis.tsukuba.ac.jp
    University of Tsukuba, Japan



    This archive was generated by hypermail 2b29 : Fri Mar 28 2003 - 14:29:01 MET