Re: [Corpora-List] statistical named entity recognition

From: Jose Maria Gomez Hidalgo (jmgomez@dinar.esi.uem.es)
Date: Tue Jan 07 2003 - 14:24:03 MET

  • Next message: Katerina Pastra: "[Corpora-List] EXTENDED Deadline: EACL workshop on the reuse of evaluation resources and metrics"

    At 12:45 02/01/2003 +0100, you wrote:

    >Hello list members,
    >My Ph.D. thesis is to be on named entity recognition for Norwegian. I want
    >to use existing programming tools implementing different statistical
    >methods. Most of my reading has been on maximum entropy modelling. Do any
    >of you have any experience with existing tools that can be used for named
    >entity recognition?

    No experience but a couple of references:

    [Chieu02] Chieu, Hai Leong, & Ng, Hwee Tou (2002). Named Entity Recognition:
    A Maximum Entropy Approach Using Global Information. Proceedings of
    the 19th International Conference on Computational Linguistics (COLING
    2002). (pp. 190-196). Taipei, Taiwan.

    [Chieu02b] Chieu, Hai Leong, & Ng, Hwee Tou (2002). Teaching a Weaker
    Classifier: Named Entity Recognition on Upper Case Text. Proceedings
    of the 40th Annual Meeting of the Association for Computational
    Linguistics (ACL-02). (pp. 481-488). Philadelphia, Pennsylvania, USA.

    Also, there may be some papers in the CONLL'02 workshop. The shared task track
    was focused on Language-Independent Named Entity Recognition, and the web page
    with papers, results, and training and testing data for Spanish and Dutch is:

    http://cnts.uia.ac.be/conll2002/ner/

    >Ideally I would like to be able to experiment with the kind of information
    >provided to the system, so I want open source code that can be modified.
    >In the case of maximum entropy modelling I would appreciate the
    >possibility of trying different algorithms.

    Tha package used by Chieu et al. is maxent
    (http://maxent.sourceforge.net/), a part of the OpenNLP project
    (http://opennlp.sourceforge.net/); it is opensource, in Java, and it has
    been used for developing several classifiers in the Grok package
    (http://grok.sourceforge.net/), including a POS tagger and a Name Finder
    for English.

    >It would be an extra bonus if I could try out the frequency redistibution
    >algorithm advocated by Mikheev.
    >I intend to post a summary of the comments received. I appreciate your help.
    >Best, Åsne Haaland
    >
    >
    >Åsne Haaland, stipendiat
    >Tekstlaboratoriet, Inst. for lingvistiske fag (http://www.hf.uio.no/tekstlab)
    >Pb. 1102 Blindern, 0317 Oslo; besøksadr.: rom 523 Henrik Wergelands hus
    >Tlf.: 22 85 67 87, faks: 22 85 69 1
    >E-post: a.t.haaland@ilf.uio.no
    >
    >

    _______________________________________________________________________________

    Jose Maria Gomez Hidalgo
    Departamento de Inteligencia Artificial
    Universidad Europea de Madrid
    28670 - Villaviciosa de Odon - MADRID
    (+34) 912115670
    jmgomez@dinar.esi.uem.es
    http://www.esi.uem.es/~jmgomez/
    _______________________________________________________________________________

    La legislación española ampara el secreto de las comunicaciones. Este
    correo electrónico es estrictamente confidencial y va dirigido
    exclusivamente a su destinatario/a. Si no es Ud., le rogamos que no difunda
    ni copie la transmisión y nos lo notifique cuanto antes.

    Spanish law guarantees privacy in electronic communications. This
    electronic transmission is strictly confidential and intended solely for
    the addressee. If you are not the intended addressee, you are kindly
    requested not to disclose nor to copy this transmission and to notify us as
    soon as possible.



    This archive was generated by hypermail 2b29 : Tue Jan 07 2003 - 14:26:56 MET