[Corpora-List] Summary Newspaper Corpora 2

From: Jan Strunk (strunk@linguistics.ruhr-uni-bochum.de)
Date: Wed Apr 16 2003 - 16:46:34 MET DST

  • Next message: Magali Jeanmaire: "[Corpora-List] ELRA News 1/2"

    Hello,

    as there have been quite a few responses
    on my second query, I'll post another summary.

    In my second query, I asked for available newspaper
    corpora in languages except (English, French, German
    and Spanish, for which a lot of ressources seem to exist).

    Many thanks for your information!

    Best regards!
    Jan Strunk
    strunk@linguistics.ruhr-uni-bochum.de

    Rik De Busser suggest a meta-site:
    >This might help:
    >http://www.ims.uni-stuttgart.de/info/Newspapers.html
    >(Not all of them are for free)

    Yvonne suggested the following sources:
    >Danish http://korpus.dsl.dk/korpus2000/indgang.php

    >Bosnian http://www.tekstlab.uio.no/Bosnian/Corpus.html

    >Swedish http://spraakdata.gu.se/lb/konk/

    Antti Arppe suggested the Finnish language bank:
    >Well there is a substantial amount of Finnish newspaper corpora (tens
    >of millions of words) and a lesser amount of Swedish newspaper
    >material (published in Finland) available in the Finnish text bank:
    >
    >http://www.csc.fi/kielipankki/
    >
    >All the info appears to be in Finnish or Swedish, but you can try to
    >contact e.g. Manne Miettinen, tel. +358 9 457 2517 e-mail:
    ><manne.miettinen@csc.fi>.

    Elisabeth Burr:
    >I can only help out with Italian, French and Spanish newspaper corpora.
    >See:
    >
    >http://www.uni-duisburg.de/Fak2/FremdPhil/Romanistik/Personal/Burr/humcomp/
    >
    >Oxford Text Archive Corpus of Italian Newspapers
    >
    >"Italian Newspaper Corpus (ita03)", in: Association for Computational Linguistics:
    >European Corpus Initiative Multilingual Corpus 1 (ECI/MCI) CD-ROM:
    >\data\ eci1\

    Bilge Say:
    >About your recent posting to corpora-list, we have a corpus of 2 M words of post
    >1990 written Turkish, which includes about 40% newspaper material (not all
    >of them news items though, including editorials, columns etc). It is available for
    >free for academic purposes; contact our project assistant Umut Ozge at
    >umut@ii.metu.edu.tr
    >for filling out the required form and receiving the corpus over the
    >Internet.
    >
    >Kemal Oflazer at Sabanci University has also a newspaper corpus of Turkish
    >(I think about 10 M words). He can be contacted at oflazer@sabanciuniv.edu

    Seza Dogruoz suggested that I contact Bilge Say.

    Shlomo Yona also suggested a Turkish corpus and offered help with Hebrew:
    >I have corpora of newspaper articles in Hebrew.
    >Tagged Turkish news text can be found at:
    >http://www.nlp.cs.bilkent.edu.tr/Center/Corpus/

    and last but not least, Paul McNamee suggested the CLEF project.
    >Not for those three, but the CLEF activity has created a newspaper
    >corpus in 8 languages with O(100k) articles per language from the years
    >1994 and 1995. In addition to German and English they have:
    >Dutch, Finnish, French, Italian, Spanish, and Swedish. Check out
    >the CLEF site at http://www.clef-campaign.org/ You might also want
    >to investigate the holdings of ELRA and the LDC.

    Many thanks again!



    This archive was generated by hypermail 2b29 : Wed Apr 16 2003 - 16:45:20 MET DST