Re: [Corpora-List] neologism finder tools

From: J R Elliott (jre@comp.leeds.ac.uk)
Date: Thu Jun 12 2003 - 16:48:15 MET DST

  • Next message: Antoinette Renouf: "Re: [Corpora-List] neologism finder tools"

    Making like simple:
    You could of course compare old and new versions with the unix command
    'diff', or flavours of this. This will indicate which words occur in the
    new version only and it only takes a few seconds. No bloated windows
    environment to slow things down.

    It's amazing what's to hand if you use a 'real' operating system:)

    John
    *********************************************************
    John Elliott
    Centre for Computer Analysis of Language and Speech
    University of Leeds. http://www.comp.leeds.ac.uk/jre/
    and Computational Intelligence Group, School of Computing
    Leeds Metropolitan University
    email: jre@comp.leeds.ac.uk or J.Elliott@lmu.ac.uk
    Home: 0113 286 6517 john.elliott@leedsalumni.org.uk
    *********************************************************

    On Thu, 12 Jun 2003, Eric Atwell wrote:

    > Sylvana,
    > A problem with "retrieving new words in a corpus" is: "new" with respect
    > to what? You can easily find all words in a corpus with only one (or
    > two..) occurrences, which makes them "rare"; but "new" implies
    > your corpus builds on a larger monitor corpus tracking the language over
    > time. As I understand it, AVIATOR/APRIL is not just software for a
    > static corpus but infrastructure for processing a (large) monitor corpus.
    > Is this what you have?
    >
    > Eric Atwell
    >
    >
    > On Thu, 12 Jun 2003, krausse wrote:
    >
    > > Dear colleagues,
    > >
    > > In Lynne Bowker's and Jennifer Pearson's book "Working with Specialized
    > > Corpora" neologism finder tools like the ones used in the AVIATOR/APRIL
    > > project are mentioned.
    > >
    > > I wonder whether there are any free or commercial programs available or
    > > how other people go about retrieving new words in a corpus.
    > >
    > > Many thanks in advance,
    > >
    > > Sylvana Krausse
    > >
    >
    >

    -- 
    *********************************************************
    John Elliott
    Centre for Computer Analysis of Language and Speech
    University of Leeds.  http://www.comp.leeds.ac.uk/jre/
    and Computational Intelligence Group, School of Computing
    Leeds Metropolitan University
    email:  jre@comp.leeds.ac.uk  or J.Elliott@lmu.ac.uk
    Home: 0113 286 6517 john.elliott@leedsalumni.org.uk
    *********************************************************
    



    This archive was generated by hypermail 2b29 : Thu Jun 12 2003 - 16:55:00 MET DST