RE: [Corpora-List] IDF values

From: Min-Yen Kan (kanmy@comp.nus.edu.sg)
Date: Wed May 12 2004 - 10:39:46 MET DST

  • Next message: Dusko Vitas: "[Corpora-List] 7th INTEX/NooJ Workshop"

    Hi Clive De Silva:
            This doesn’t quite fit the bill, but if you don’t mind an
    international corpus, UC Berkeley has a computed the DFs of words on the
    Stanford WebBase corpus. See

    http://elib.cs.berkeley.edu/docfreq/

    My group has been using it for a number of different projects that require
    DF / IDF.

    Regards,

    Min-Yen KAN
    Assistant Professor
    Department of Computer Science, School of Computing
    National University of Singapore, Singapore 117543
    Office: S15-05-05
    Tel: ++ (65) 6874-1885
    Fax: ++ (65) 6779-4580
    kanmy@comp.nus.edu.sg
    http://www.comp.nus.edu.sg/~kanmy

    -----Original Message-----
    From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
    Behalf Of Clive De Silva
    Sent: Wednesday, May 12, 2004 4:24 PM
    To: CORPORA@HD.UIB.NO
    Subject: [Corpora-List] IDF values

    Hi all.
     
    I need to get IDF values for an American corpus of at least 100MW words. I
    have access to TREC4 and TREC5 corpus but would prefer to not have to
    extract the information 'manually' and was wondering if there are IDF values
    out there already calculated from a large corpus. If not, are there any
    tools for extracting IDFs efficiently?
     
    Regards,

    Clive De Silva
    MPhil student at the Computing Lab
    University of Cambridge, UK



    This archive was generated by hypermail 2b29 : Wed May 12 2004 - 10:39:03 MET DST