Re[2]: [Corpora-List] Looking for super large Russian corpus

From: Victor Zakharov (vz1311@mail.ru)
Date: Mon Oct 25 2004 - 15:57:02 MET DST

  • Next message: Lothar Lemnitzer: "[Corpora-List] GermaNet Workshop: Second Call for Papers"

    -----Original Message-----

    > http://lib.ru/ claims to have close to 5Gb of Russian-language text, multiple
    > genres, sources, etc.
    >
    > a substantial part of it is OCR'ed, and consequently some pieces exhibit
    > problems, such as end-of-page hyphenation. so you may have to do some quality
    > control, depending on your needs.
    >

    A part of this digital library was tagged and is accesible as a normal corpus at the address:
    http://www.aot.ru/search1.html

    Victor Zakharov
    Department of Mathematical Linguistics
    St.Petersburg State University



    This archive was generated by hypermail 2b29 : Mon Oct 25 2004 - 16:22:38 MET DST