RE: [Corpora-List] Wordsmith concordance

From: Lou Burnard (lou.burnard@computing-services.oxford.ac.uk)
Date: Thu Dec 19 2002 - 10:59:00 MET

  • Next message: Michael Goetze: "[Corpora-List] Summary: corpora with annotated information structure"

    If you are indeed working on texts derived from the BNC, then a fairly
    obvious thing to check would be whether the lines are in fact duplicated in
    the BNC itself. Go to http://sara.natcorp.ox.ac.uk/lookup.html and type one
    of your repeated phrases into the box.

    There are (still) a few erroneous text duplications. More interestingly
    there are several cases of genuine repetition-with-variants caused by
    different newspapers (or the same newspaper at different times) re-using the
    same agency material.

    If you're not using the BNC of course this is irrelevant, except insofaras
    it illustrates the general principle that one should *always* suspect the
    data!

    Lou

    -----Original Message-----
    From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no]On
    Behalf Of Anne Harrap
    Sent: 17 December 2002 10:52
    To: corpora list - messages to list
    Subject: [Corpora-List] Wordsmith concordance

    Does anyone else get a lot of duplicated entries when doing a
    concordance in Wordsmith?

    Not sure if this is a bug or we are doing something wrong...

    Anne Harrap
    Languages Centre Documentalist
    School of Languages
    Oxford Brookes University
    Oxford (UK)

    Tel: +44 865 483723
    Fax: +44 865 483791
    Email: anneh@sol.brookes.ac.uk



    This archive was generated by hypermail 2b29 : Thu Dec 19 2002 - 10:58:42 MET