[Corpora-List] The size of Internet in words

From: Serge Sharoff (s.sharoff@leeds.ac.uk)
Date: Tue Jan 20 2004 - 17:22:42 MET

  • Next message: Paul Buitelaar: "[Corpora-List] CFP: ECAI-2004 Workshop on Ontology Learning and Population"

    Does anyone know the size of Internet in terms of words and relative
    to languages? Google shows the number of documents on its front page
    (3,307,998,701 at the time of writing this), there is a comparative
    analysis of the database used by various search engines at:
    http://www.searchengineshowdown.com/stats/sizeest.shtml

    Two things that are not known from the statistics: the number of words
    of real text per page and the amount of texts for a given language.

    The first question is partly addressed by an older statistic survey:
    http://searchengineshowdown.com/stats/nature99.shtml
    Can we estimate that 6 terabytes per 800 million pages gives the average
    page length to 7.5 KB, or about 1000 words (in English)? So, the size of
    modern Internet would be about 3 terawords, if it was English only. But can
    we trust this and how about its distribution over different languages?

    Best,
    Serge

    --
    Dr. Serge Sharoff
    Centre for Translation Studies
    School of Modern Languages and Cultures
    University of Leeds
    Leeds, LS2 9JT
    

    tel: +44(0)113 343 7287 fax: +44(0)113 343 3287 WWW: http://www.comp.leeds.ac.uk/ssharoff/



    This archive was generated by hypermail 2b29 : Tue Jan 20 2004 - 17:56:01 MET