Re: [Corpora-List] unencumbered corpora

From: Francis Bond (bond@cslab.kecl.ntt.co.jp)
Date: Mon Jan 24 2005 - 09:12:23 MET

  • Next message: Przemek Kaszubski: "Re: [Corpora-List] My semantic prosody questionnaire"

    G'Day,

    Lou Burnard <lou.burnard@computing-services.oxford.ac.uk> writes:

    > Can anyone point me to any annotated language corpora which are freely
    > available under something like the GNU Public Licence? All the ones I
    > have thought of so far seem to be available only under some kind of
    > complicated licensing scheme which precludes (e.g) commercial
    > exploitation, unrestricted copying, etc. And cost money.

    OPUS <http://logos.uio.no/opus/> sounds ideal. It includes many
    European (and even non-European) texts, is freely available (GPL or
    similar licenses) and even POS tagged and marked up in XML.

    >
    > I'd like to have a corpus of a reasonable size (1 million+ words) in any
    > European language (tho English or French are preferable) with some
    > kind of word-level annotation, which I can hack about, use in teaching,
    > and put on a freely-distributable CD, without worrying about copyright
    > lawyers. There *must* be some somewhere!

    It is already distributed on the Knorpora CD
    <http://sslmit.unibo.it/%7ebaroni/welcome_to_knorpora.html>, a
    modified version of the Knoppix 3.3 Live CD for students of
    corpus-based computational linguistics.

    > It doesn't even have to be in XML -- though it will be when I've
    > finished with it.

    -- 
    Francis Bond  <www.kecl.ntt.co.jp/icl/mtg/members/bond/>
    NTT Communication Science Laboratories | Machine Translation Research Group
    



    This archive was generated by hypermail 2b29 : Mon Jan 24 2005 - 16:49:22 MET