Corpora: non-english corpora

From: jre@comp.leeds.ac.uk
Date: Thu Jun 07 2001 - 17:03:51 MET DST

  • Next message: Yuri Tambovtsev: "Corpora: Korean corpus or phonemic frequency data"

    Dear list members

    I wrote on June 1st:

    >I am holding out my begging bowl again! I am trying to find non-english
    >PoS-TAGGED corpora, which can be a little as a few thousand words. I am ideally looking for
    >such languages as Arabic, Hindi, Russian, Basque, Spanish, Vietnamese, Latin and even Sanskrit. > Any of these or similar would be most welcome.

    I have had some very good responses and will be posting my thanks etc soon. In the interim does
    anyone know or have in their power to grant me, access to any of the following or their closely
    related family group members:
    Vietnamese, Tamil, Hausa, Malay, Gaelic, Greek, Japanese, Russian or any of the North American
    Indian languages.

    ..still hopeful

    John

    ********************************************************
    John Elliott
    Centre for Computer Analysis of Language and Speech
    University of Leeds
    email: jre@scs.leeds.ac.uk
    phone: 0113 233 6827
    Web-site http://www.scs.leeds.ac.uk/jre
    ********************************************************



    This archive was generated by hypermail 2b29 : Thu Jun 07 2001 - 17:00:22 MET DST