[Corpora-List] MULTEXT-East language resources V3

From: Tomaz Erjavec (tomaz.erjavec@ijs.si)
Date: Wed Jun 30 2004 - 17:08:56 MET DST

  • Next message: Leena Kolehmainen: "[Corpora-List] Symposium on lexicon"

    MULTEXT-East V3: http://nl.ijs.si/ME/V3/

    MULTEXT-East resources are a multilingual dataset for language
    engineering research and development. This dataset contains, for
    Bulgarian, Croatian, Czech, English, Estonian, Hungarian, Lithuanian,
    Resian, Romanian, Russian, Serbian, and Slovene, some or all of the
    following resources:
    - MULTEXT-East morphosyntactic specifications (free)
    - MULTEXT-East morphosyntactic lexica (licence)
    - MULTEXT-East morphosyntactically annotated "1984" corpus (licence)
    - MULTEXT-East comparable corpus (licence)
    - MULTEXT-East parallel speech corpus (free)
    - and associated documentation (free).

    The resources comply with the EAGLES and TEI recommendations and are
    freely available for research use - to get access to the licenced
    resources, you need to fill out and submit the on-line licence.

    What's new in this edition?
    - all corpora now encoded in XML TEI P4
    - joins together the resources from Version 1 (1998) and Version 2 (2002)
    - adds Serbian annotated "1984" and Resian morphosyntactic specifications
    - an updated bibliography
    - many errors from previous versions corrected
    - and probably some new ones introduced...

    Hope you find them useful!

    -- 
    Tomaž Erjavec           | Dept. of Knowledge Technologies
    email: tomaz.erjavec@ijs.si  | Jozef Stefan Institute
    www:   http://nl.ijs.si/et/  | Jamova 39, SI-1000, Ljubljana
    fax:   (+386 1) 4251 038     | Slovenia
    



    This archive was generated by hypermail 2b29 : Wed Jun 30 2004 - 17:19:33 MET DST