Re: [Corpora-List] Looking for super large Russian corpus

From: Roman Yangarber (roman@cs.nyu.edu)
Date: Sat Oct 23 2004 - 22:16:21 MET DST

  • Next message: Xiaofei Lu: "[Corpora-List] Input Chinese for latex"

    > Date: Sat, 23 Oct 2004 14:36:47 +0400 (MSD)
    > From: "P bI K O B___ B.B. (MOCKBA)" <rykov@narod.ru>
    >
    > I am looking for super large Russian corpus to use in my research project.
    > Corpus doesn’t require any tagging, it can be Russian text only.

    http://lib.ru/ claims to have close to 5Gb of Russian-language text, multiple
    genres, sources, etc.

    a substantial part of it is OCR'ed, and consequently some pieces exhibit
    problems, such as end-of-page hyphenation. so you may have to do some quality
    control, depending on your needs.

    -- 
    Roman Yangarber
    ______________________________     __________________________________________
                                       Research Assistant Professor
           voice +1 (212) 998-3264     Department of Computer Science            
             fax +1 (212) 995-4123     Courant Institute of Mathematical Sciences
                                       New York University                       
                  roman@cs.nyu.edu     715 Broadway, 7th Floor
              www.cs.nyu.edu/roman     New York, NY 10003-6806                   
    ______________________________     __________________________________________
          mobile: +358 50 4668 383     in Finland
    ______________________________     __________________________________________
    



    This archive was generated by hypermail 2b29 : Sat Oct 23 2004 - 22:31:55 MET DST