Corpora: Phrase list analysis

angela_kluge@sil.org
Tue, 09 Nov 1999 19:57:51 -0500

Dear List Members,

I would appreciate it if anyone could point me towards research or
publications which focus on the analysis of grammatical features across
different speech varieties.

Let me explain in more detail what I mean.

These past years I've been involved with SIL International (also known as
the Summer Institute of Linguistics) in language assessment surveys in West
Africa. Part of this involvement was in Sill's larger study of the Be
language continuum (kA language family).

I'm currently pursuing an MA study scheme at Cardiff university in Language
and Communication Research and I would like to write my thesis on some
aspects of our Be study and that is why I'm writing to you.

The Gbe language varieties (Kwa language group) are spoken in the
south-eastern part of West Africa. Expanding westward from south-western
Nigeria, the Gbe language communities occupy large areas in southern Benin
and Togo, as well as in south-eastern Ghana.

Based on an extensive comparative study of the Gbe language continuum by
Capo (1986), SIL chose 45 varieties of the Gbe continuum for the elicitation
of word and phrase lists. The purpose of the elicitation was to determine
the degree of linguistic similarity among these varieties, their initial
clustering and geographical distribution, and to establish priorities for
the second stage of SIL's study of the Gbe language continuum.

My question concerns the analysis of these phrase lists.

The phrase list focuses on the verbal and the person (or noun) reference
systems. For greater reliability, most grammatical features were elicited
in at least two phrases; 35 phrases are listed, focusing on 17 different
grammatical features.

After their elicitation, the data were entered into a word processor. For
further analysis of the elicited items, a set of guidelines for similarity
groupings based on shared grammatical features was established. In order to
arrive at a statistical evaluation of the similarity groupings, the elicited
forms were analysed with the computer program Wordsurv (Wimbish 1989).

However, Wordsurv is not designed to analyse grammatical features but to
analyse lexical similarity. Therefore, the computed percentage and variance
matrixes do not necessarily reflect the actual degree of grammatical
similarity between these varieties, especially in light of the low number of
records (55 for the 35 phrases). However, the computed percentage matrixes
do indicate patterns of language convergence and divergence based on shared
grammatical features. Most interestingly, these patterns match to a large
extent the patterns provided by Wordsurv for the analysis of the elicited
word lists.

For my MA thesis I would like to explore the relationship between the word
and phrase list results more thoroughly. And of course to do this I will
need to do some background research on similar studies where grammatical
features were compared in a similar manner across speech varieties.

Do you know of any studies where grammatical features were compared in a
similar manner, that is with statistical means?

I'd be most grateful if you could let me know of anything that might be
helpful within the realm of my thesis.

Thanks a lot,

Angela Kluge,

Post-Graduate Student
Centre for Language & Communication Research
Cardiff University
PO Box 94, Cardiff CF1 3XB, Wales, UK.


References:

Capo, H.B.C. 1986. Renaissance du gbe. Une langue de l'Afrique
occidentale. Etude critique sur les langues ajatado: l'ewe, le fon, le
gen, l'aja, le gun, etc. Universit‚ du B‚nin. Institut National des
Sciences de l'Education. Etudes et Documents de Sciences Humaines. S‚rie
A: Etudes, Num‚ro 13. Lom‚.

Wimbish, J.S. 1989. Wordsurv: A program for analyzing language survey
word lists. Vers. 2.4. Dallas, TX: SIL.