Re: [Corpora-List] Q: How to identify duplicates in a largedocument collection

From: Marc Kupietz (kupietz@ids-mannheim.de)
Date: Wed Jan 12 2005 - 14:12:21 MET

Next message: Spenader J.K.: "[Corpora-List] Second-CFP: Cross modular approaches to ellipsis: ESSLLI workshop"

Previous message: Grant, T.: "[Corpora-List] Punctuation follow up"
In reply to: Marc Kupietz: "Re: [Corpora-List] Q: How to identify duplicates in a largedocument collection"
Next in thread: Normand Peladeau: "Re: [Corpora-List] Q: How to identify duplicates in a largedocument collection"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

As promised, you can now download the part of our tool which calculates
n-gram based similarities in text collections via anonymous-ftp from:

ftp://ftp.ids-mannheim.de/kt/CSSCCb-4.0.tar.bz2

Regards,
Marc

P.S.: Our network connection is only about 80% up and currently only
active ftp is possible...

-- 
Marc Kupietz                                      Tel. (+49) 621/1581-409
Institut für Deutsche Sprache, Dept. of Lexical Studies/Corpus Technology
PO Box 101621, 68016 Mannheim, Germany        http://www.ids-mannheim.de/

Next message: Spenader J.K.: "[Corpora-List] Second-CFP: Cross modular approaches to ellipsis: ESSLLI workshop"
Previous message: Grant, T.: "[Corpora-List] Punctuation follow up"
In reply to: Marc Kupietz: "Re: [Corpora-List] Q: How to identify duplicates in a largedocument collection"
Next in thread: Normand Peladeau: "Re: [Corpora-List] Q: How to identify duplicates in a largedocument collection"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Wed Jan 12 2005 - 14:25:37 MET