Re: Corpora: Wordsmith question

From: Darren Pearce (darrenp@cogs.susx.ac.uk)
Date: Wed Apr 03 2002 - 16:47:08 MET DST

  • Next message: PD Dr. Edward Wornar: "Re: Corpora: Wordsmith question"

    On Wed, 3 Apr 2002, PD Dr. Edward Wornar wrote:

    > From: Van den Heuvel M Mev <MVDH@sun.ac.za>
    > Subject: Corpora: Wordsmith question
    > Date: Wed, 3 Apr 2002 11:13:22 +0200
    >
    > > Hi everybody,
    > >
    > > I'm having a spot of trouble with the Wordlist tool in the Wordsmith suite
    > > that I hope someone out there can help me with. I want to compare two almost
    > > identical word lists containing the entries of a pronunciation lexicon.
    > > There are some inconsistencies between the lists, i.e. items missing in the
    > > one that should be in the other and vice versa. I need to identify the
    > > missing words. I thought that I could use the "compare word lists" function
    > > in Wordlist for this purpose by setting the minimum frequency to 1 word, but
    > > it's not working. I'm obviously doing something wrong.
    > >
    > > If you don't have a quick answer to the Wordsmith problem, but know of
    > > another tool that could help me do just this one little task with a few
    > > button clicks, I would also appreciate your response!
    >
    > If the format of the wordlists is just plain text with one word on each line,
    > a simple diff should do the trick. What system are you using? If it's a UNIX-like
    > system, you'll have diff, otherwise you might want to get the cygwin tools. At
    > the shell prompt, sh like
    >
    > diff wordlist1 wordlist2 > differences
    >
    > will write the differences into a file 'differences'. If you want a user interface
    > so as to take over parts from one file into the other or see the files side by side
    > with the differences marked, try emacs (or XEmacs) which comes with the useful tool
    > ediff.
    >
    > Cheers
    >
    > Edi

    Once again assuming that your files are just plain text then you could
    also use the unix 'comm' command. This allows you to look at those lines
    that are unique to the first file, unique to the second and common to
    both. Any of these lists can be suppressed.

    Good luck.

    Darren.

    +-------------------------------------------------------------------------+
    | |
    | Darren Pearce |
    | COGS, Sussex University, Falmer, Brighton |
    | Mobile: 07950 255 448 |
    | Email: darrenmpearce@bigfoot.com |
    | Web: http://www.cogs.susx.ac.uk/users/darrenp |
    | |
    +-------------------------------------------------------------------------+



    This archive was generated by hypermail 2b29 : Fri Apr 05 2002 - 15:15:55 MET DST