Re: Corpora: Bigram, Trigram and ???gram if N=4 ???

Adam Kilgarriff (Adam.Kilgarriff@itri.brighton.ac.uk)
Wed, 23 Jul 1997 11:05:40 +0100

4-gram is far preferable to invented words. It has the not
inconsiderable advantage that everyone understands it and uses it already.

The logic of fancy latinate forms in English resides in the depths of
British class structure as promulgated through the study of classics
at public (ie private) schools. It suits the ruling classes to baffle
their underlings by using words which require knowledge of dead
languages to understand. The translation of the bible into the "vulgar
tongues" (eg from Latin/Greek into the languages people spoke) was a
body-blow to feudalism but the work isn't finished yet.

You'll also hit a problem with 5-gram which would presumably be
'pentagram' - phonetically fine, but unfortunately it already has a
meaning: "a five-pointed star used as a magic sign".

Adam Kilgarriff

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Adam Kilgarriff
Senior Research Fellow tel: (44) 1273 642919
Information Technology Research Institute (44) 1273 642900
University of Brighton fax: (44) 1273 642908
Lewes Road
Brighton BN2 4GJ email: Adam.Kilgarriff@itri.bton.ac.uk
UK http://www.itri.bton.ac.uk/~Adam.Kilgarriff
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%