Andrew and Spela:
Just a word of caution: studies like Spela's provide interesting
and suggestive data, but figures will surely vary, depending on the
translator, topic, etc. [all the usual sociolinguistic caveats apply
here] (and note Jean's contribution, with varying rates). I was
coauthor of a study comparing English and Spanish, which basically tried
to get Spanish to fit into the standard readability curves in a fairly
simple way. We were only partially successful (the counts were
hand-done by yours truly, featuring a variety of types of text,
pseudo-randomly sampled, and especially translations from one
language to the other, as well as translations from 3rd languages
[French & German] into each). To the best of my recollection (I could
look up the exact figures if anyone is hot for them), our results for
Spanish-English were rather close to Jean's for French (I assume his
were on large amounts of text done by computer--if this holds up [not
surprising, given the close relationship of French and Spanish], it may
indicate that, for this kind of data, not such a huge amount of text is
On Wed, 24 Apr 2002, spela vintar wrote:
>for Eastern-European languages you can compare the lengths of Orwell's 1984
>and its translations that were collected within the Multext-East project.
>The original Multext project (http://www.lpl.univ-aix.fr/projects/multext/)
>should provide the same for English, German, French, Spanish etc., however I
>wasn't able to find it on their homepage at first glance...
>Below we give an estimate for the number of words, by language. The
>wordcounts were produced by removing the SGML tags from the texts and then
>using a 'wc'-like procedure.
>Andrew Bredenkamp wrote:
>> Hello everyone,
>> Does anyone know where I can find a list of relative text length?
>> Taking one language as an index (100), I would like a list of the (other)
>> main European languages - e.g. (made up):
>> Spanish: 100
>> English: 105
>> French: 110
>> German: 85
>> ... etc.
>> Thanks a lot in advance for any help you can give me.
>> Andrew Bredenkamp
>> acrolinx GmbH
>> URL: www.acrolinx.com
-- James L. Fidelholtz e-mail: firstname.lastname@example.org Posgrado en Ciencias del Lenguaje tel.: +(52-2)229-5500 x5705 Instituto de Ciencias Sociales y Humanidades fax: +(01-2) 229-5681 Benemérita Universidad Autónoma de Puebla, MÉXICO
This archive was generated by hypermail 2b29 : Thu Apr 25 2002 - 17:30:26 MET DST