RE: [Corpora-List] Analysing Reuters Corpus Using Wordsmith Version 3

From: Ute Römer (ute.roemer@uni-koeln.de)
Date: Fri Jun 11 2004 - 18:51:40 MET DST

  • Next message: Jason Skomorowski: "[Corpora-List] Newswire corpus with sections?"

    Tony,

    > > Btw, have you (or anyone else) done a proper word count of the
    > > corpus? (the
    > > RC distributors told me they hadn't) -- Using MP2.2 would of course be a
    > > solution to that problem since it does a word count whenever you load a
    > > corpus anyway.
    >
    > FYI you can find lots more statistics on the corpus at:
    >
    > http://about.reuters.com/researchandstandards/corpus/statistics/index.asp

    Yes, I've seen the statistics on the Reuters pages, thanks. You offer a lot
    of diagrams on interesting features like distribution of stories across days
    or POS distribution, but unfortunately there is no word/token count of the
    entire corpus (or maybe I missed that information). Maybe somebody else has
    done such a word count?

    Best... Ute

     



    This archive was generated by hypermail 2b29 : Fri Jun 11 2004 - 18:50:16 MET DST