[Corpora-List] Announcement: MultiSemCor English/Italian parallel corpus

From: Luisa Bentivogli (bentivo@itc.it)
Date: Fri Nov 05 2004 - 19:17:06 MET

  • Next message: Marco Baroni: "[Corpora-List] web as corpus at cl2005: call for expressions of interest"

    [Apologies to those of you who receive multiple copies of this
    announcement]

    We are pleased to announce the first release of the MultiSemCor corpus,
    available for browsing and distribution at the web site:

    http://multisemcor.itc.it

    MultiSemCor is an English/Italian parallel corpus, developed at ITC-irst
    by translating into Italian part of the SemCor corpus. English and
    Italian texts have been automatically aligned at the word level and
    SemCor semantic annotations have been transferred to Italian words. As a
    result, MultiSemCor texts are semantically annotated with a shared
    inventory of senses taken from the MultiWordNet lexical database
    (http://multiwordnet.itc.it).

    At present MultiSemCor is composed of 116 English texts along with their
    corresponding 116 Italian translations, for a total of about 500,000
    tokens.

    The parallel texts and their annotations are freely consultable on the
    Web through the MultiSemCor on-line interface, which amounts to both a
    bilingual semantic concordancer and a bi-text browser. The MultiSemCor
    and the MultiWordNet browsers are directly linked to each other.

    Best regards,

    The MultiSemCor Team

    ---
    ITC-irst Centro per la Ricerca Scientifica e Tecnologica
    Cognitive and Communication Technologies Divion
    Via Sommarive, 18  38050 Povo - Trento ITALY
    http://tcc.itc.it/
    



    This archive was generated by hypermail 2b29 : Sat Nov 06 2004 - 01:29:55 MET