Re: [Corpora-List] Comparing files

From: Lluís Padró (padro@lsi.upc.es)
Date: Mon Nov 17 2003 - 09:54:29 MET

  • Next message: Pascual Cantos Gómez: "[Corpora-List] Summary Spanish Taggers"

    >I'm doing a project that involves comparing two very large word lists (~40.000 and 70.000 words). What I need to find out, is which words are on one list and not on the other (and/or vice versa).
    >Can anyone give me a hint as to how to do this? (I was thinking; maybe a perl script?)
    >
    >

      sort list1 > list1.sorted
      sort list2 > list2.sorted
      join -v1 list1.sorted list2.sorted

      (if you use -v2 instead, you'll get words in list2 and not in list1)

           best

    -- 
    ------------------------------------------------------------------------
    * Lluís Padró i Cirera * UNIVERSITAT POLITÈCNICA DE CATALUNYA
    *Departament de Llenguatges i Sistemes Informàtics <http://www.lsi.upc.es>*
    *Centre de Recerca TALP <http://www.talp.upc.es>*
    Tel: XX-34-934 015 652
    Fax: XX-34-934 017 014
    padro@lsi.upc.es <mailto:padro@lsi.upc.es>
    http://www.lsi.upc.es/~padro <http://www.lsi.upc.es/%7Epadro> Mòdul C6 - 
    Campus Nord
    Jordi Girona Salgado 1-3
    08034 Barcelona
    

    ------------------------------------------------------------------------



    This archive was generated by hypermail 2b29 : Mon Nov 17 2003 - 10:13:13 MET