[Corpora-List] grammatical annotation of Ancient Greek and Latin texts: the summary

From: Daniel Riaño (danielrr@eresmas.net)
Date: Sun Dec 22 2002 - 00:26:24 MET

  • Next message: Yuri Tambovtsev: "[Corpora-List] Sary-Uygur or Yellow Uyghur is added"

    Here's the promised summary. Answers have been too few as to provide
    a scheme for a general view of the state of the Grammatical
    annotation of the ancient Greek and Latin corpus of texts, but I hope
    this very brief, incomplete and imperfect account can be of some use
    to somebody (mostly classicists, there's not a lot for computer
    linguists). I have interspersed many comments of my own. I am still
    interested in receiving new information about this subject.

    Many thanks to Eckhard Bick , Rodney Decker, Tim Finney, Dennis
    Hukel, Kathleen McNamee, Tito Orlandi and specially to Anne Mahoney
    who sent me a very useful resume of Perseus notation schemes for the
    morphological annotation of the Perseus texts.

            Well, as far as I have been able to collect, and putting
    aside the Bible and the corpus of the earliest Greek Christian texts,
    it seems to me that yet there is not a lot going on in the field of
    the grammatical annotation of the ancient Greek and Latin corpus
    outside the aegis of the Perseus project. Most of the work already
    done and the existing projects concerns only the morphological
    annotation of the texts, be it with automatic taggers (not
    disambiguated) or with heavy intervention of human operators
    (disambiguated), this being the preferred approach for Biblical
    literature texts.

            On the Greek side we have the well known collection of
    morphollogically tagged literary texts and subliterary and
    documentary papyri and ostraka from the Perseus project and the Duke
    Databank, accessible trough the Web (<http://www.perseus.tufts.edu/>,
    <http://www.perseus.tufts.edu/cache/perscoll_DDBDP.html>). Few
    projects in humanities can compare to the Perseus project in scope,
    progression, the mass of collected and processed data and public
    success. Latinists have, in addition to the Latin pages of the
    Perseus site, a debt with the Belgian classicists in account of the
    remarkable CETEDOC / CTLO CD ROMS (see
    <http://www.brepols.net/publishers/cd-rom.htm#CLCLT> and the LASLA/
    CIPL databases (<http://www.ulg.ac.be/cipl/lsl.htm>). (see
    http://bcs.fltr.ucl.ac.be/DicLanD.html for info on both projects).

            Bible students and researchers of the New Testament texts and
    related literature (in Greek, Latin and Hebrew) have unparalleled
    electronic tools at their disposal with BibleWindows
    (http://www.silvermnt.com/bwinfo.htm) and specially the Accordance
    (<http://www.oaksoft.com/>). See reviews in
    <http://www.swcp.com/~kfapa/Bible/index.html>. Yet, there is not much
    to syntax there. The electronic version of Nestle-Aland critical
    edition of the Greek New Testament looks promising (see
    <http://www.uni-tuebingen.de/cgi-bin/abs/abs?propid=54>.)

            There are a few sites devoted to the grammatical analysis of
    the Greek-Latin corpus, like http://visl.sdu.dk (offers a small Tree
    Bank of Greek and Latin analysed sentences with a tree visualizer).

            The OpenText project (<http:// www.opentext.org>) offers a
    very interesting proposal for other kind of text annotation for the
    edition of ancient texts (mainly intended for papyri).

            For the serious exploitation of syntactically analysed
    corpora specifically designed for Greek texts (my main concern) there
    is much less to see. Any researcher interested in publicising his
    work might be interested in the TIGER facilities for the public
    exploitation of existing banks of syntax graphs. Don't miss the Tiger
    Project page
    <http://www.ims.uni-stuttgart.de/projekte/TIGER/annotation/lfg/parsing/>
    and see what's there!

            To the best of my knowledge, there is not any project to
    develop a parser for the automatic analysis of Ancient Greek texts,
    and that's a pity. No matter how tentative such analyses are still
    today (and will remain in the foreseeable future), the creation of
    such a tool(s) would help syntacticians to improve the existing
    grammars and would provide the general scholar with annotated texts
    (after heavy human intervention).

            Some years ago I wrote a paper on the syntactical annotation
    of the corpus of ancient Greek text [[1]]. In this paper, after a
    short introduction to the concept of parsers and grammatical editors,
    I expressed my personal conviction that: a) both kind of computer
    tools should be developed fast for the good of Greek and Latin
    studies; b) projects should start sharing the results obtained with
    such tools i.e. the annotated corpora; c) considering the needs of
    classicists today, the use of grammatical editors would probably be
    the preferred choice of classicists. [A grammatical editor is a
    program that offers i) the interface allowing a human operator to
    grammatically parse a text on a computer, facilitating some of the
    tasks involved; ii) the tools to store, search, retrieve, and compile
    statistics about the text corpora thus parsed; iii) the interface to
    present the final user the results of any of the above mentioned
    operations]

            Conclusion (c), earnest as it was, was not however absolutely
    disinterested: in the same paper I presented my own grammatical
    editor, called Aristarchus. If anybody is interested, I can send him
    the pdf version of this paper.

            I owe to Kathleen McNamee the tangential to this point but
    nonetheless very interesting information about the release (due in
    2003) of the "Commentaria et Lexica Graeca in Papyris reperta", a
    major edition of ancient grammatical commentaries edited by G.
    Bastianini, H. Maehler, M. Haslam, and C. Römer. Thanks!

    [[1]] Riaño Rufilanchas, Daniel. 1998. Análisis y etiquetado
    sintáctico del corpus de los textos clásicos: modelos y perspectivas.
    Studia Iranica, Mesopotamica & Anatolica 3:107-129.

    >I wrote:
    >Dear List,
    >
    >I am collecting information about the existing corpora of
    >grammatically annotated texts in Greek and Latin, the tools used for
    >the annotation or the edition of the texts to be analysed, and the
    >schemes of grammatical annotation. Any information, in or off list,
    >would be greatly appreciated.
    >
    >I am aware of the main projects for the grammatical annotation of
    >several Biblical texts and mainly the Greek New Testament
    >(Accordance, BibleWorks) and of course the Perseus Project, but
    >would also appreciate any bibliographcal aid or useful link to the
    >technical description of the grammatical background and the
    >annotation schemes of such projects. I'd also appreciate any
    >information about present and future projects on the same field
    >and/or in other ancient languages, too. Any province of the
    >grammatical oecumene is relevant. If there is enough off list input
    >about the matter, I will summarise to the list(s). Many thanks in
    >advance,
    >
    >Daniel

    -- 
    ~~~~~~~~~~~~~~~~~~~
    Daniel Riaño Rufilanchas
    Madrid, España
    



    This archive was generated by hypermail 2b29 : Sun Dec 22 2002 - 00:38:31 MET