Re: [Corpora-List] Newspaper Corpora

From: Jan Strunk (strunk@linguistics.ruhr-uni-bochum.de)
Date: Tue Apr 15 2003 - 11:36:52 MET DST

  • Next message: cyrille: "Re: [Corpora-List] about French corpus & tools"

    Hi,

    yesterday I had asked for suggestions about newspaper corpora.
    Many thanks to the people you have answered so far.
    They already provided me with a lot of suggestions (summary below).
     
    Unfortunately, all the suggested corpora were either
    in English or in German. Its exactly these two languages that
    I have already evaluated (on the Wall Street Journal corpus and
    Neue Zürcher Zeitung).

    Do you perhaps know of any newspaper corpora in other languages like Danish,
    Turkish or Hungarian?

    Thanks!

    Jan
    strunk@linguistics.ruhr-uni-bochum.de
    Sprachwissenschaftliches Institut
    Ruhr-Universität Bochum
    Germany

     Summary of the responses I got so far:

    Mahtab Nikkhou suggested looking at the ELDA ressources collection:
    >You may have a look at ELDA's on-line language resources catalogue from:
    >http://www.elda.fr/cata/tabtext.html
    >If you wish to order a database, please contact Ms Valerie Mapelli at
    >mapelli@elda.fr

    Jana Diesner suggested the following for German:
    > der klassiker: http://corpora.ids-mannheim.de/~cosmas/, auch unter: http://www.ids-mannheim.de/kt/corpora.shtml
    > alternativ: http://www.coli.uni-sb.de/sfb378/negra-corpus/

    Tony Rose:
    > You could also try the Reuters Corpus:
    > http://about.reuters.com/researchandstandards/corpus/
    > It's an archive of some 800,000 English language news stories, is freely available, and marked up in XML (NewsML in fact).

    Jerome Richalot:
    >How about the METER COrpus at
    >http://www.dcs.shef.ac.uk/nlp/meter/Metercorpus/metercorpus.htm
     
    And last but not least Thorsten Brants proposed the NEGRA corpus:
    >the NEGRA Corpus (http://www.coli.uni-sb.de/sfb378/negra-corpus/)
    >contains articles from the German newspaper Frankfurter Rundschau.
    >As part of the syntactic annotation, the texts are separated into sentences,
    >which disambiguates the periods.

    Thanks again!



    This archive was generated by hypermail 2b29 : Tue Apr 15 2003 - 11:34:26 MET DST