RE: [Corpora-List] The size of Internet in words

From: Mark Davies (Mark_Davies@byu.edu)
Date: Tue Jan 20 2004 - 20:22:11 MET

  • Next message: Donna Byron: "[Corpora-List] ACL-2004 Workshop CFP: Discourse Annotation"

    Serge Sharoff wrote:

    > Does anyone know the size of Internet in terms of words and
    > relative to languages [and] the amount of texts for a
    > given language?

    Some possible starting points:

    (Previous CORPORA discussion; Dec 2001)
    http://helmer.aksis.uib.no/corpora/2001-4/0161.html

    (Paper by Bill Fletcher; originally written 2001)
    http://www.kwicfinder.com/FletcherCLLT2001.pdf

    (Widely-quoted 2000 paper by Greffenstette and Nioche)
    http://arxiv.org/ftp/cs/papers/0006/0006032.pdf

    (Dec 2003; by language; but not by words)
    http://www.caslon.com.au/metricsguide6.htm

    (April 2003; by language; but not by words)
    http://www.dlib.org/dlib/april03/lavoie/04lavoie.html

    Mark Davies

    =================================================
    Mark Davies
    Assoc. Prof., Linguistics
    Brigham Young University
    (phone) 801-422-9168 / (fax) 801-422-0906
    http://davies-linguistics.byu.edu

    ** Corpus design and use // Web-database scripting **
    ** Historical linguistics // Functional-typological grammar **
    ** Spanish and Portuguese historical and dialectal syntax **
    =================================================



    This archive was generated by hypermail 2b29 : Tue Jan 20 2004 - 20:34:37 MET