Corpora: Keywords in Literary Texts Summary

From: T Murphy (tmorpheme@hotmail.com)
Date: Wed Jun 06 2001 - 02:42:39 MET DST

  • Next message: Nina Wacholder: "Corpora: Workshop on The Technology of Browsing Applications"

      
    Dear Corpora Listers:
    Here is a summary of my inquiry concerning the corpus analysis of keywords in literary texts.
    1. Mick Short noted that there is some discussion of keywords in the play Romeo and Juliet in chapter 4 of Jonathan Culpeper, Language and Characterisation, (Longman 2001). Mick also suggested that it might be worth having a look at David Hoover's Language and Style in The Inheritors (University Press of America 1998), which compares Golding's book with various corpora.
     2. Christopher Tribble noted that M. Stubbs, Text and Corpus Analysis (1996) specifically mentions Raymond Williams¡¯ notion of keywords. Christopher also commented that Mike Scott has been doing work on cultural keywords using Guardian newspaper data.
    3. Adam Kilgarriff reminds me that Mike Scott's Wordsmith supports this sort of analysis, and that Tony Bernber Sardinha knows a lot about the area but from an EFL rather than a literary perspective
    The following two leads were very useful:
    4. Ramesh Krishnamurthy has written ¡°Ethnic, Racial and Tribal: The Language of Racism?¡± in Texts and Practices, eds. Caldas-Coulthard & Coulthard, Routledge, London, 1996. In this article , Ramesh looked at three keywords in the Bank of English corpus (then 121 million words, now 418 million words) and made specific references to Raymond Williams' Keywords.

    5. Andrius Utka, a master student at Vytautas Magnus University, Faculty of Humanities has done an analysis of George Orwell¡¯s 1984 using the statistical methods of corpus linguistics. It is available for viewing at http://donelaitis.vdu.lt, by follow the link from "publications" to "sankirta".
    Among other things, this paper suggests a useful method for discovering what the keywords of a given literary text actually are:

    ¡°The following procedure is used for finding key words in 1984:
    The frequency list of all word forms is produced by the computer program Wordsmith Tools.
    Only 100 most frequent nouns are left and all the other words are removed from the list.
    The nouns are lemmatized.
    The frequency list of these 100 nouns is produced from the large corpus of the Bank of English.
    The occurrences of words in both frequency lists are compared using chi-squared statistical test¡±.
    The key words are sorted out according the chi-square value.
    Finally, there were two respondents working on texts other than literary ones:
     6. Wendy J. Anderson, a PhD Student in the Department of French at the University of St Andrews is carrying out keyword analysis on administrative texts in French.
     7. Geoffrey Williams has done work on extracting keywords in scientific corpora. Geoffrey also notes that Berry Roghe worked in a similar way on literary texts in the 70's.
     The references that Geoffrey provided are:
     Berry-Roghe G.L.M. (1973). The computation of collocations and their relevance in lexical studies, dans Aitken A.J,. Bailey R., Hamilton-Smith N., (eds), The Computer and Literary Studies, Edinburgh, Edinburgh University Press
    Williams, G. 1998. "Collocational Networks: Interlocking Patterns of Lexis in a Corpus of Plant Biology". International Journal of Corpus Linguistics. .3(1): 151-171
    Williams G. 1999. Les rseaux collocationnels dans la construction et l'exploitation d'un corpus dans le cadre d'une communaut de discours scientifique. These en anglais linguistique de corpus. Universit de Nantes. http://perso.wanadoo/geoffrey.williams
    It seems clear that the field is still in its very early stages of development. I suspect, however, that it may experience some growth over the next few years, although perhaps the non-literary areas may grow more quickly than the literary one that concerns me.
    Thanks very much to all who responded.
     Dr. Terry Murphy
    Yonsei University
    College of Liberal Arts
    Dept. of English Language and Literature
    Seoul 120--749
    KoreaGet more from the Web. FREE MSN Explorer download : http://explorer.msn.com



    This archive was generated by hypermail 2b29 : Wed Jun 06 2001 - 02:32:17 MET DST