Re: [Corpora-List] Q: How to identify duplicates in a largedocument collection

From: Marc Kupietz (kupietz@ids-mannheim.de)
Date: Wed Jan 12 2005 - 14:12:21 MET

  • Next message: Spenader J.K.: "[Corpora-List] Second-CFP: Cross modular approaches to ellipsis: ESSLLI workshop"

    As promised, you can now download the part of our tool which calculates
    n-gram based similarities in text collections via anonymous-ftp from:

    ftp://ftp.ids-mannheim.de/kt/CSSCCb-4.0.tar.bz2

    Regards,
     Marc

    P.S.: Our network connection is only about 80% up and currently only
    active ftp is possible...

    -- 
    Marc Kupietz                                      Tel. (+49) 621/1581-409
    Institut für Deutsche Sprache, Dept. of Lexical Studies/Corpus Technology
    PO Box 101621, 68016 Mannheim, Germany        http://www.ids-mannheim.de/
    



    This archive was generated by hypermail 2b29 : Wed Jan 12 2005 - 14:25:37 MET