Re: [Corpora-List] unencumbered corpora

From: Francis Bond (bond@cslab.kecl.ntt.co.jp)
Date: Mon Jan 24 2005 - 09:12:23 MET

Next message: Przemek Kaszubski: "Re: [Corpora-List] My semantic prosody questionnaire"

Previous message: ELDA: "[Corpora-List] CLEF 2005 - CALL FOR PARTICIPATION"
In reply to: Lou Burnard: "[Corpora-List] unencumbered corpora"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

G'Day,

Lou Burnard <lou.burnard@computing-services.oxford.ac.uk> writes:

> Can anyone point me to any annotated language corpora which are freely
> available under something like the GNU Public Licence? All the ones I
> have thought of so far seem to be available only under some kind of
> complicated licensing scheme which precludes (e.g) commercial
> exploitation, unrestricted copying, etc. And cost money.

OPUS <http://logos.uio.no/opus/> sounds ideal. It includes many
European (and even non-European) texts, is freely available (GPL or
similar licenses) and even POS tagged and marked up in XML.

>
> I'd like to have a corpus of a reasonable size (1 million+ words) in any
> European language (tho English or French are preferable) with some
> kind of word-level annotation, which I can hack about, use in teaching,
> and put on a freely-distributable CD, without worrying about copyright
> lawyers. There *must* be some somewhere!

It is already distributed on the Knorpora CD
<http://sslmit.unibo.it/%7ebaroni/welcome_to_knorpora.html>, a
modified version of the Knoppix 3.3 Live CD for students of
corpus-based computational linguistics.

> It doesn't even have to be in XML -- though it will be when I've
> finished with it.

-- 
Francis Bond  <www.kecl.ntt.co.jp/icl/mtg/members/bond/>
NTT Communication Science Laboratories | Machine Translation Research Group

Next message: Przemek Kaszubski: "Re: [Corpora-List] My semantic prosody questionnaire"
Previous message: ELDA: "[Corpora-List] CLEF 2005 - CALL FOR PARTICIPATION"
In reply to: Lou Burnard: "[Corpora-List] unencumbered corpora"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Mon Jan 24 2005 - 16:49:22 MET