Re: [Corpora-List] English-language paraphrase corpora

From: Paul Clough (p.d.clough@sheffield.ac.uk)
Date: Wed Feb 02 2005 - 11:56:18 MET


Dear all,

I have a collection of around 1800 news agency and newspaper texts created by
trained journalists for the specific purpose of analysing text reuse within
journalism. The METER corpus is currently available for research use and can be
obtained by contacting either Prof. Rob Gaizauskas (robertg@dcs.shef.ac.uk) or
myself. In the corpus, we have up to 9 UK national newspaper versions of an
agency text (including both tabloid and broadsheet versions) which have been
categorised into derived or not derived from the agency version. Find more
information about text reuse in journalism from my thesis (you can download
from here: http://ir.shef.ac.uk/cloughie/papers.html) and the METER web page:
http://www.dcs.shef.ac.uk/nlp/meter/

Regards,

Paul.

-------------------------------------------
Dr. Paul Clough
Dept. Information Studies
University of Sheffield

+44 (0)114 2222664
-------------------------------------------

Quoting radev@umich.edu:

> Our system, a precursor to Google News is also active on the Web:
>
> www.newsinessence.com
>
> Using it, we have collected 50,000 or so clusters of related news.
>
> --
> Drago
>
>
> nielsen@dcs.kcl.ac.uk wrote:
> >
> >
> > If you don't mind collecting raw text, news.google.com does this.
> >
> > Leif
> >
> > >
> > > Dear All,
> > >
> > > I am looking for English-language "comparable" corpora. I.e. I want,
> > > e.g., 2 collections of articles from different sources describing same
> > > events.
> > >
> > > Alternatively, would anyone know off-hand how one would go about
> > > constructing such comparable collections?
> > >
> > > (This is to be used for automatic paraphrasing.)
> > >
> > > Any pointers greatly appreciated,
> > >
> > > Olga
> > > University of Sussex NLP group
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> >
>
>
> --
> Dragomir R. Radev radev@umich.edu
> Assistant Professor of Information, Electrical Engineering and
> Computer Science, and Linguistics, the University of Michigan, Ann Arbor
> Phone: 734-615-5225 Fax: 734-764-2475 http://www.si.umich.edu/~radev
>
>



This archive was generated by hypermail 2b29 : Wed Feb 02 2005 - 12:09:24 MET