Corpora: corpora variety/summary

From: Vladimir Rykov, PhD in Computational Linguistics, MOCKBA (rykov2000@mail.ru)
Date: Wed Aug 30 2000 - 08:36:59 MET DST

  • Next message: James L. Fidelholtz: "Re: Corpora: corpora variety/summary"

         Fans of tagging may skip my letter again.

         I am sending the compilation of the answers I got - after many
    requests to do it:

                              -----------------

    Hi. Michael Barlow's pages may be a good start:
    <http://www.ruf.rice.edu/~barlow/corpus.html>
    You are also welcome to take a look at the CoSIH site (link below and follow
    links
    thereof).

    http://spinoza.tau.ac.il/hci/dep/semitic/izreel.html

    Good luck,
    Shlomo Izre'el

                              ------------------

    I can recommend the ICAME archive pages. They have all the documentation
    of the ICAME CD-ROM online.
    See http://www.hd.uib.no/icame/newcd.htm.

    For a critical discussion of genre divisions
    in corpora, there are a number of sources, such as Kesser et al.
    (Proc. ACL '97) or a paper I have written together with Mathias Kirsten,
    Proc. EACL '99.

    Cheers, Maria

         My comments - I could not get in contact with Maria or Mathias
    Kirsten - esp that their publications are unavailable for me :-(. - Vl R

                              ------------------

    There is quite a good article by David Lee of Lancaster University on genre
    and corpora (particularly the BNC). It can be downloaded from
    http://members.xoom.com/davidlee00/downld.htm

    Regards, Veronika Koller

         My comments - it is a real good article. - Vl R

                              ------------------

    The Linguistic Data Consortium's Catalog describes the 170 corpora that the
    consortium
    currently distributes. You can find the catalog at:
        http://www.ldc.upenn.edu/Catalog
    On that page you can see various summaries of our corpora and search by data
    type, data
    source (broadcast news, conversation), language, recommended application and
    the
    sponsored research program, if any, that developed them.

    Please let us know if this doesn't answer your questions. You can write to me
    or to
    ldc@ldc.upenn.edu for more information.

    Best wishes, Chris

                              ------------------

    Since I did not see any replies to Vladimir, here is an answer.
    But I think that others may have much better suggestions.

    You can go to the archives of this list, going through
    http://linguistlist.org/ . That is possibly the most
    comprehensive source.

    Old (1996) information below:

    There is a site with a survey at each of these addresses:

    http://www.hd.uib.no
    http://www.clres.com/siglex.html
    http://www.ruf.rice.edu/~barlow/corpus.html
    http://www.ling.lu.se
    http://clr/nmsu/edu/clr/CLR.html
    http://www.cogsci.ed.ac.uk/elsenet/eci_summary.html
    http://www.ids-mannheim.de/telri/telri.html
    http://www.ceth.rutgers.edu

    I have no idea whether any of these are still valid.

    Happy search. Bill Mann

                              ------------------

         I sent my thanks to ALL the people mentioned above.

                                Vladimir Rykov

      Linguistic Institute of the RAS



    This archive was generated by hypermail 2b29 : Wed Aug 30 2000 - 08:33:42 MET DST