I am looking for tables which list some kind of "standard ratio" between
average sentence lengths in words and characters across various language
pairs, e.g. "German : English - #words 1.2 #chars 1.16" or so (these
were randomly chosen figures ;-) ). I assume that these figures are text
type specific so that it would be a bit difficult to give accurate
figures for the "general" case, but at least some "average values"
should be fine as a starting point... The figures could easily be
computed e.g. from large, balanced (or, perhaps preferably, parallel)
corpora with marked sentence boundaries, but I don't have such corpora
at hand, and second, I need these figures for quite a bunch of language
pairs (as many as possible).
If you have pointers to any relevant information, please email me
directly; I'll post a summary here, of course.
Have a nice day,
Oli
PS: If there should be some garbage following the text of this mail: I
just didn't yet find a way to tell Outlook'97 not to include it ;-)