Re: [Corpora-List] N-gram string extraction

From: Christer Johansson (christer.johansson@lili.uib.no)
Date: Wed Aug 28 2002 - 13:27:32 MET DST

  • Next message: ken Church: "Re: [Corpora-List] n-grams (follow-up question)"

    andrius@ccl.bham.ac.uk wrote:
     ...
    > It's running for the 7th day now.
    >

    My guess:

    Somewhere a sort operation is needed. I guess that sort operation is
    implemented in a "simple for the programmer" way. Which means that it is
    likely somewhere between n*n and n*n*n in time. Unix sort uses more efficient
    algorithms that are more likely n*log n. One million keys would take
    between 10^12 and 10^18 operations in the slow versions, in the fast sort
    version it is 10^6*log(2?) of 10^6; is it somewhere near 20*10^6? This
    is most likely where your problem is.

       /Christer



    This archive was generated by hypermail 2b29 : Wed Aug 28 2002 - 13:42:45 MET DST