[Corpora-List] Helsinki Corpus of Swahili released for academic use

From: Kielipankki (ling@csc.fi)
Date: Mon Oct 25 2004 - 11:45:58 MET DST

  • Next message: Stephen.Wan@csiro.au: "[Corpora-List] ALTW 2004 CALL FOR PARTICIPATION - Early Reg: 31-Oct-04"

    Helsinki Corpus of Swahili released

    The Helsinki Corpus of Swahili (HCS) has been released and is
    available at the Language Bank of Finland for academic research
    purposes on an interactive Linux server and via a web interface,
    WWW-Lemmie. All usage requires a personal user account.

    HCS is an annotated corpus of Standard Swahili text. It contains news
    texts from several current Swahili newspapers as well as from the news
    site of Deutsche Welle. It also contains extracts from a number of
    books containing prose text, including fiction, education and
    sciences. The total size of the corpus is 12.5 million words in 25.000
    XML documents. The XML format used is a derivate of TEI.

    HCS has been annotated with SALAMA (Swahili Language Manager), a
    multi-purpose language management environment, developed at the
    University of Helsinki by Arvi Hurskainen, Professor of African
    languages. The corpus contains information of such features as the
    base form of the word (lemma), part-of-speech, and morphology,
    including noun class affiliation and verb morphology. It also contains
    the etymology of loan words and glosses in English.

    For more information about the corpus (and a link to the web-based
    application form), go to:

        http://www.csc.fi/kielipankki/aineistot/hcs/index.phtml.en

    Note that commercial use of the corpus, including the interactive
    use of SALAMA, is possible, but must be negotiated separately with
    Professor Hurskainen (ahurskai AT ling DOT helsinki DOT fi).

    Best regards,

    Mickel Grönroos and Manne Miettinen
    The Language Bank of Finland
    at the Finnish IT center for science CSC

    Arvi Hurskainen
    Professor of African languages, University of Helsinki

    --
    Kielipankki | Språkbanken i Finland | The Language Bank of Finland
    The Finnish IT center for science CSC
    PL 405 (Tekniikantie 15 a D), 02101 Espoo, Finland, +358-9-4572237
    



    This archive was generated by hypermail 2b29 : Mon Oct 25 2004 - 15:16:55 MET DST