Corpora: Historical background of Corpus Linguistics

From: John McKenny (
Date: Fri Apr 19 2002 - 14:27:39 MET DST

  • Next message: Eric Atwell: "Re: Corpora: Learner Corpora"

    Dear Charo
    My money's on Jonathan Swift as a clear precursor of corpus methodology. In
    Gulliver's Travels (1726) Part III, Ch 5 he describes a machine which
    generates books of 'philosophy, poetry, politics, law, mathematics and
    theology without the least assistance from genius or study'. The professor
    of the Academy of Lagado who invented this engine (worked by 40 pupils who
    cranked handles and transcribed the output) told Gulliver that he had
    "emptied the whole vocabulary into the frame and made the strictest
    computation of the general proportion there is in books between the number
    of particles, nouns, and verbs, and other parts of speech". It is well
    known that Swift is a/the master of satire and that he was having a go at
    the Royal Society in this passage but he shows that he had thought through
    a lot of what would later become AI. He also draws attention, through the
    professor of Lagado Academy, to the importance of prefabs in building or
    reconstituting text.
    In his introduction to "A complete collection of genteel and ingenious
    conversation" (1738) Swift takes up prefabs again and describes how he
    built up a collection of fashionable sayings over 12 years field work: "I
    determined to spend 5 mornings, to dine 4 times, pass three afternoons, and
    six evenings every week in the houses of the most polite families...I
    always kept a large table-book in my pocket; and as soon as I left the
    company I immediately entered the choicest expressions. He then spent a
    further 16 years "digesting it into a method"..Finally he sat on his work
    for a further six or seven years. observing: " I have not been able to add
    above nine valuable sentences to enrich my collection; from whence I
    conclude that what remains will amount only to a trifle".
    Nowadays Swift's collection of smart chat might contain in the blurb that
    it was based on the author's own corpus which was more than 30 years in the
    A passage in Section 1, Introduction of an earlier work (1704) Tale of a
    Tub provides further justification for making Swift the patron saint of
    Corpus Linguistics if such an honour were not anathema (wrong word!) to the
    Dean's stern Protestant anti-popery. Or, at any rate, a prime precursor.
    "I am informed our ...rivals... challenge us to a comparison of books, both
    as to weight and number...we are ready to accept the challenge..." Although
    he's always playful and elusive I think he shows a genuine fascination with
    the quantificational, physical side of language. Have I been taken in by
    one of the greatest hoaxers of all time?
    Mucha suerte

    John McKenny
    Departamento de Gestão
    Escola Superior de Tecnologia de Viseu
    Campus Politécnico
    3500 Viseu

    This archive was generated by hypermail 2b29 : Fri Apr 19 2002 - 14:18:22 MET DST