Re: Corpora: Bilogarithmic type/token ratio

ucleacf (ucleacf@ucl.ac.uk)
Fri, 12 Sep 97 10:43:17 +0000

At 08:55 AM 12/9/97 +0200, Alice Carlberger wrote:

>inflection). Or could anyone suggest other measures of complexity, i.e., style,
>that are more appropriate for cross-linguistic use? Any help in this matter
>would be greatly appreciated.

During the development of an efficient multi-lingual text alignment
algorithm, we found that the number of verbs seems to be a fairly reliable
constant across European language sentence pairs that are mutual
translations. The indication came from English and Portuguese. See

J. Campbell, N. Chatterjee, A.C. Fang, and M. Manela. 1996. Improving
Automated Alignment in Parallel Corpora. In Language, Information and
Computation: PACLIC 11. ed. by B-S. Park and J-B. Kim. Language Education
and Research Institute, Kyung Hee University, Seoul, Korea, 1996. pp 63-72.

What could be interesting is to investigate whether variations in the number
of verbs across two different texts (in two different European languages)
could be used as a pointer towards text complexity, though we do know that
this parameter indicates different genres of speech and writing in English.
See, for instance,

Fang, A.C. 1995. The distribution of infinitives in Contemporary British
English - a study based on the British ICE Corpus. In Oxford Literary &
Linguistic Computing, 10:4. pp 247-257.

Fang, A.C. Forthcoming. Verb Forms and Subcategorisations. In Oxford
Literary and Linguistic Computing, 12:4.

There are, of course, many other better measures but automatic processing
for part-of-speech information (tagging) is now relatively easy and
inexpensive to achieve.

Hope this helps.

Alex
---------------------------------------
Alex Chengyu Fang
Research Fellow
Department of Phonetics and Linguistics
University College London
Gower Street, London WC1E 6BT, U.K.

E-Mail: alex@phonetics.ucl.ac.uk
http://www.phon.ucl.ac.uk/home/alex/home.htm

Tel: 0171 388 4309
0171 387 7050 ext. 3169
Fax: 0171 383 4108
---------------------------------------