Corpora: Free Swedish lexical and corpora resources for research purposes

From: Yvonne Cederholm (Yvonne.Cederholm@svenska.gu.se)
Date: Fri Apr 05 2002 - 13:37:58 MET DST

  • Next message: Magali Duclaux: "Corpora: ELRA News"

    Språkdata and Språkbanken (The Bank of Swedish), Department for Swedish,
    University of Göteborg, have decided to release dictionary and corpora
    resources for research and education purposes within Swedish
    universities. The resources are available under certain conditions,
    which are specified in the licence files attached to the resource files.
    Foreign universities can apply to the acting director for The Bank of
    Swedish, Yvonne Cederholm (lbadm@svenska.gu.se).

    Dictionary: Svenska ord (LEXIN)
    -------------------------------
    A Swedish dictionary containing appr. 20 000 lexical units (lexical
    categories: pronunciation, part-of-speech, inflexion, definition,
    valency, and linguistic exemples).

    The dictionary is available in two formats:

    - web version (access only for Swedish universities)
    Address: http://spraakbanken.gu.se/lb/lexin/

    - XML version for language technology purposes
    Address: ftp://ftp.spraakbanken.gu.se/pub/reskit/LEXIN.zip

    The SynTag Tree Bank
    --------------------
    A Swedish tree bank, containing 158 newspaper articles (about 100 000
    running words) from the Press-65 corpus,
    The corpus can only be used for research purposes and for higher
    education. Instructions are required as the format doesn't follow modern
    markup standards. Contact Jerker Järborg (Jerker.Jaerborg@svenska.gu.se)
    for more information.
    Address: ftp://ftp.spraakbanken.gu.se/pub/reskit/syntag.zip

    The Swedish PAROLE corpus
    -------------------------
    A morfosyntactically annoted corpus comprising about 19 million running
    words. The corpus can only be used for research purposes and for higher
    education.
    Address: ftp://ftp.spraakbanken.gu.se/pub/reskit/parole.zip

    There is also a web version of the Swedish PAROLE corpus (unrestricted
    access):
    http://spraakbanken.gu.se/lb/parole/

    (The Language Bank plans to release a new lemmatized and
    morfosyntactically annotated corpus of about 100 mill. running words at
    the end of this year. The annotation is based on the information in the
    SAOL (The Swedish Academy Glossary).

    The board of the Language Bank of Swedish:
    Yvonne Cederholm (acting director), Jerker Järborg, Torgny Rasmark, and
    Karin Warmenius

    --
    __________________________________
    Yvonne Cederholm
    Tf föreståndare för Språkbanken,
    

    Inst. för svenska språket Göteborgs universitet Box 200 SE 405 30 GÖTEBORG tfn.: +46 (0)31 - 773 52 25 fax: +46 (0)31 - 773 44 55



    This archive was generated by hypermail 2b29 : Fri Apr 05 2002 - 13:38:27 MET DST