[Corpora-List] EMILLE Tools Release

From: Mcenery, Tony (eiaamme@exchange.lancs.ac.uk)
Date: Fri Feb 20 2004 - 12:58:56 MET

  • Next message: Tobias Engler: "[Corpora-List] German Synonyms"

    Dear All,

    Apologies if you receive multiple copies of this message, especially if
    you have no interest whatsoever in its contents.

    Following a number of requests, I have decided to mount the EMILLE
    character encoding conversion software (unicodify) on the EMILLE
    download site (http://www.ling.lancs.ac.uk/corplang/emille/default.htm).
    The conversion software was developed at Lancaster University, and
    allows users to convert 30 (or so) different 8 bit encodings of South
    Asian scripts commonly found in both publishing and on the web into 16
    bit little-endian Unicode format. The software is very useful indeed if
    you plan to collect South Asian corpus data from the web. As with the
    EMILLE corpus, the software may be used freely for non-commercial
    research.

    Also, an Urdu POS tagger is now mounted on the EMILLE download site.
    Again, it is free for use in non-commercial research.

    Both downloads include documentation etc.

    Enjoy!

    Tony McEnery,
    Professor of English Language and Linguistics,
    Dept. Linguistics and Modern English Language,
    Lancaster University,
    Bailrigg,
    Lancaster,
    LA1 4YT.



    This archive was generated by hypermail 2b29 : Sun Feb 22 2004 - 19:58:42 MET