RE: [Corpora-List] statistical named entity recognition

From: Mari Olsen (molsen@microsoft.com)
Date: Thu Jan 02 2003 - 18:53:47 MET

  • Next message: Claus Pusch: "[Corpora-List] Call for papers: Romance corpus linguistics"

    I am chairing a workshop 12 July 2003, after ACL 2003 (Sapporo) intended to address questions related to multilingual NE recognition and reusability of statistical and symbolic methods across languages. I encourage you and (and others) to submit a paper to the workshop and/or to attend it. Here's the relevant info (website to be up by 10 January, and an official CFP to go out via the customary channels: tentative submission deadline 7 March 2003). (Note: Microsoft is providing some travel funds, to help defray expenses for students.)

    Mari Broman Olsen
    Natural Language Group
    **************************************************************
    Title and description:
    Multilingual and Mixed-language Named Entity Recognition: Combining Statistical and Symbolic Models

    Organizing Committee:
    Kevin Humphreys, Mari Broman Olsen, Joseph Pentheroudakis, Robert Stumberger, Hajime Wada

    Description:
    Named Entity (NE) Recognition systems vary widely, from high-speed bulk methods optimized for indexing, to deep semantic parsers tuned for specific domains. Optimal ways to combine statistical and symbolic models also vary, depending on applications and tasks. Is it possible to
            -maximize use of knowledge-rich resources (e.g. lexicons, NE grammars, parsing) while permitting corpus-based training for domain or language?
            -acquire and share resources (including lexicons and grammars) across languages?
            -balance performance speed with reasonable accuracy?
            -use specific language patterns while permitting rapid transfer to another language?
            -minimize variability in results across language types?

    We welcome research on combined models, in which these tradeoffs are calculated in particular ways. We hope that the workshop will bring together work on robust and deep multilingual and mixed language NE recognition from different perspectives. Possible topics include
            -the role of the lexicon vs. dynamic processing information
            -grammars and lexicons shared (or ported) across languages
          -acquisition of multilingual resources (e.g. from corpora)
          -translating NEs across multiple languages
            -domain tuning

    Papers may cover one or more of these (or related) areas. Demonstrations of implemented NE systems are also welcome.

    -------------
    Program committee
    Roberto Basili (University of Roma Tor Vergata)
    Robert Gaizauskas (Sheffield)
    Ralph Grishman (New York University)
    Lauri Karttunen (Parc, Inc.)
    Kevin Knight (ISI)
    Gary Geunbae Lee (Pohang University of Science and Technology)
    Dekang Lin (University of Alberta)
    Boyan Onyshkevich (Department of Defense)
    John Prager (IBM)
    Jeff Reynar (Microsoft)
    Mila Ramos-Santacruz (SRA)
    Ellen Riloff (University of Utah)
    Beth Sundheim (NCCOSC, San Diego)
    Janine Toole (Gavagai Technology)
    Benjamin Tsou (City Univ. of Hong Kong)
    Marc Vilain (MITRE)
    Sornlertlamvanich Virach (Thailand National Electronics and Computer Technology)

    -----Original Message-----
    From: Åsne Thea Fraser Haaland [mailto:a.t.haaland@ilf.uio.no]
    Sent: Thursday, January 02, 2003 3:45 AM
    To: corpora@hd.uib.no
    Subject: [Corpora-List] statistical named entity recognition

    Hello list members,
    My Ph.D. thesis is to be on named entity recognition for Norwegian. I want
    to use existing programming tools implementing different statistical
    methods. Most of my reading has been on maximum entropy modelling. Do any
    of you have any experience with existing tools that can be used for named
    entity recognition? Ideally I would like to be able to experiment with the
    kind of information provided to the system, so I want open source code that
    can be modified. In the case of maximum entropy modelling I would
    appreciate the possibility of trying different algorithms. It would be an
    extra bonus if I could try out the frequency redistibution algorithm
    advocated by Mikheev.
    I intend to post a summary of the comments received. I appreciate your help. Best, Åsne Haaland

    Åsne Haaland, stipendiat
    Tekstlaboratoriet, Inst. for lingvistiske fag (http://www.hf.uio.no/tekstlab) Pb. 1102 Blindern, 0317 Oslo; besøksadr.: rom 523 Henrik Wergelands hus
    Tlf.: 22 85 67 87, faks: 22 85 69 19
    E-post: a.t.haaland@ilf.uio.no



    This archive was generated by hypermail 2b29 : Thu Jan 02 2003 - 18:56:20 MET