Re: Corpora: minimum size of corpu?

From: ramesh@clg.bham.ac.uk
Date: Thu Feb 10 2000 - 02:08:19 MET

  • Next message: Gordon and Pam Cain: "Re: Corpora: minimum size of corpu?"

    If there are only 10 chapters, 276 verses, of Biblical Aramaic extant,
    then that's the biggest corpus of Biblical aramaic the world is ever going
    to see.
    I don't know how many "words" there are in an average verse, but say there
    are 20, you'll have a corpus of c. 55,200 words. You may be able to discover
    some interesting features in the word-frequency list, especially by comparing
    the list with word frequencies for other small corpora of similar size, and
    especially other corpora of similar content, in Aramaic or other languages.
    You may also be able to find interesting features in repeated phraseologies,
    again more so with contrastive studies.
    Forensic linguistics has been looking the problems of using quantitative
    methods for short texts (suicide notes, threatening letters, etc) and
    small corpora (small sets of witness statements, one of which may be disputed)
    etc. Some colleagues at Birmingham may have clearer ideas on this.
    But software tools that use statistical methods tend to yield more
    reliable results when applied to larger corpora, as far as I understand
    the maths involved (which isn't very far!).

    ramesh Krishnamurthy
    Corpus Research Group
    University of Birmingham



    This archive was generated by hypermail 2b29 : Thu Feb 10 2000 - 02:06:53 MET