Re: [Corpora-List] English-language paraphrase corpora

From: Gregor Erbach (gor@acm.org)
Date: Tue Feb 01 2005 - 11:16:14 MET

  • Next message: nielsen@dcs.kcl.ac.uk: "Re: [Corpora-List] English-language paraphrase corpora"

    Hi Olga,
    Google News (news.google.com) performs grouping of different
    news articles relating to the same event, and can be used
    for constructing such a corpus.
    However, many of the articles will be duplicates, as different
    newspapers take over the same text from the press agencies.

    regards,

        Gregor

    Quoting Olga Shaumyan <olgas@sussex.ac.uk>:

    >
    > Dear All,
    >
    > I am looking for English-language "comparable" corpora. I.e. I want,
    > e.g., 2 collections of articles from different sources describing same
    > events.
    >
    > Alternatively, would anyone know off-hand how one would go about
    > constructing such comparable collections?
    >
    > (This is to be used for automatic paraphrasing.)
    >
    > Any pointers greatly appreciated,
    >
    > Olga
    > University of Sussex NLP group
    >
    >
    >
    >
    >
    >
    >

    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Dr. Gregor Erbach http://purl.org/net/gregor/
    DFKI GmbH, Language Technology Lab http://www.dfki.de/
    Tel. +49 (681) 302-5354 mailto:erbach@dfki.de



    This archive was generated by hypermail 2b29 : Tue Feb 01 2005 - 11:31:39 MET