Re: Corpora: Latinate words in corpora

Jean Veronis (Jean.Veronis@newsup.univ-mrs.fr)
Wed, 20 Oct 1999 12:19:32 +0200

At 09:51 20/10/99 +0800, Chris Allen wrote:
>I wondered if anyone on this list could help me with an enquiry.
>
>A student of mine is interested in obtaining frequency information for
>Latin words using a corpus. In particular, she would like to come up with a
>top 10 list of the most frequent Latinate words in English.
>
>Does anyone know of a corpus which is in some way 'tagged' according to
>etymological origin? The only thing I can remotely think of would be the
>dictionary database of a historically-orientated dictionary such as the OED
>which might be able to supply such etymological information.
>

The first step is probably for her to define exactly what she is looking
for. "Latin word" is probably not the same as "from Latin etymology".

At one end of the spectrum thre are cited words or expressions such as "o
tempora! o mores!" which are probably recognized as foreign by most English
speakers; at the other end, there are words like "computer" (and a good
chunk of the English lexicon!) which derive from latin words ("computare",
through the French "computation"), but are probably not perceived as
foreign. In between there is a wide range of cases, such as frozen
expressions which are lexicalized ("ad hoc") or less lexicalized ("ad
hominem"), words such as "corpus", "index" which keep their latin
morphology but are probably not perceived as latin by many people,
scientific terms which have been constructed from latin (plant names, such
as "Culerpa taxifolia"), etc.

Therefore depending on her exact problem, she will find widely different
figures.