RE: Corpora: Relatve text length

From: Tolkin, Steve (Steve.Tolkin@FMR.COM)
Date: Thu Apr 25 2002 - 21:21:07 MET DST

  • Next message: David Horowitz: "RE: Corpora: Relatve text length"

    The following was copied from
    http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp
    on 2002-04-25.
    It is based on the documentation for Microsoft SQL Server.
    This data shows that, after compression, the languages
    produce about the same number of bytes.
    However English is slightly smaller than the others,
    and Japanese is somewhat larger.

    People who are truly interested in this topic should download all
    these files, expand them, remove everything but the text, and then
    report the results back.

    In another place I recall seeing that using text for lexicographic
    purposes (counting words, characters, etc.) is allowed under any
    interpretation of copyright.

    <quote>
    SQL Server 2000 Books Online (Updated)
     
    Posted: February 20, 2002
      
     Download
     
     English
     35.8 MB Download File
     2 hr 54 min @ 28.8 kbps
     
     Chinese (Simplified)
     37.1 MB Download File
     3 hr 0 min @ 28.8 kbps
     
     Chinese (Traditional)
     37.88 MB Download File
     3 hr 4 min @ 28.8 kbps
     
     French
     38.05 MB Download File
     3 hr 5 min @ 28.8 kbps
     
     German
     38.5 MB Download File
     3 hr 7 min @ 28.8 kbps
     
     Italian
     37.17 MB Download File
     3 hr 0 min @ 28.8 kbps
     
     Japanese
     41.72 MB Download File
     3 hr 23 min @ 28.8 kbps
     
     Korean
     37.89 MB Download File
     3 hr 4 min @ 28.8 kbps
     
     Spanish
     37.62 MB Download File
     3 hr 3 min @ 28.8 kbps
     
     
    Download the updated documentation for Microsoft SQL Server 2000. SQL
    Server Books Online (Updated) includes the complete documentation that
    shipped with SQL Server 2000 plus minor revisions.

    SQL Server Books Online (Updated) is available for download as a
    cabinet file (.cab). This file contains multiple files that have been
    compressed into one extractable file. You can extract the compressed
    files by using an expansion utility such as Expand.exe,
     ...
    </quote>
     
    Hopefully helpfully yours,
    Steve

    -- 
    Steven Tolkin          steve.tolkin@fmr.com      617-563-0516 
    Fidelity Investments   82 Devonshire St. V8D     Boston MA 02109
    There is nothing so practical as a good theory.  Comments are by me, 
    not Fidelity Investments, its subsidiaries or affiliates.
    



    This archive was generated by hypermail 2b29 : Thu Apr 25 2002 - 21:21:18 MET DST