Corpora: Annotated Old English corpus now available

From: Susan Pintzuk (sp20@york.ac.uk)
Date: Sun Aug 27 2000 - 14:26:28 MET DST

  • Next message: Vladimir Rykov, PhD in Computational Linguistics, MOCKBA: "Corpora: inner structure/outer environment"

          The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus
                            of Old English

    The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English
    (henceforth the Brooklyn Corpus) is a selection of texts from the Old
    English Section of the Helsinki Corpus of English Texts, annotated to
    facilitate searches on lexical items and syntactic structure. It is
    intended for the use of students and scholars of the history of the
    English language. The Brooklyn Corpus contains 106,210 words of Old
    English text; the samples from the longer texts are 5,000 to 10,000 words
    in length. The texts represent a range of dates of composition, authors,
    and genres. The texts in the Brooklyn Corpus are syntactically and
    morphologically annotated, and each word is glossed. The size of the
    corpus is approximately 12 megabytes.

    The syntactic annotations enable the users to pose and answer questions
    about word order, constituent order, abstract structure, and syntactic and
    morphological characteristics of the texts in the corpus. The annotations
    are general-purpose and as theory-neutral as possible, while still
    incorporating the insights of modern linguistic theory; they can be used
    by scholars with widely varying research interests. The syntactic
    annotations mark constituents, both clausal and non-clausal, by labelled
    brackets, with some relations marked by empty categories. The structure
    assigned to a sentence by the labelled bracketing can be quite complex,
    but it is not a complete syntactic analysis: the function of the
    bracketing is not to assign a structure to Old English sentences but
    rather to facilitate searches.

    The Brooklyn Corpus is available without fee for educational and research
    purposes, but it is not in the public domain. More information about the
    Brooklyn Corpus and how to access it is available at
    http://www-users.york.ac.uk/~sp20/corpus.html. Downloading the Brooklyn
    Corpus Manual is unrestricted, but the corpus texts and search scripts are
    available only to users who agree formally to the conditions of use.

    Susan Pintzuk
    Department of Language and Linguistic Science
    University of York
    Heslington, York YO1 5DD
    United Kingdom
    sp20@york.ac.uk
    Telephone: +44 1904 432661



    This archive was generated by hypermail 2b29 : Mon Aug 28 2000 - 09:18:40 MET DST