Corpora: Peer evaluation of (web-based) corpora and other materials

From: Mark Davies (mdavies@ilstu.edu)
Date: Thu Apr 06 2000 - 00:19:48 MET DST

  • Next message: Jason Eisner: "Corpora: 2nd CFP: Finite-State Phonology (SIGPHON 2000)"

    I hope this post is not too far off-topic, although it does pertain to the
    type of (web-based) corpora that many of us are creating for use inside and
    outside of academia. I also apologize for any cross-postings; I am sending
    this to two other lists as well.

    ------------------

    We are in the process of revising the promotion and evaluation procedures
    in all of the departments at Illinois State University, and I have been
    asked by a committee to get input from individuals at other institutions
    concerning how non-peer-reviewed, web-based materials could/should be
    evaluated by the institution.

    Perhaps I can provide some concrete examples of the type of issues that the
    committee is looking at. In my case, I have created several online corpora
    that have been used by researchers and students at other
    institutions. These include a "Polyglot Bible"
    (http://mdavies.for.ilstu.edu/bible) that allows users to search for a word
    in the entire Gospel of Luke in one of thirty languages and see all of the
    hits, along with (most importantly) the parallel passages for other related
    languages (eg. Gothic, Old English, Icelandic, German, etc), which allows
    cross-linguistic comparison. (A more expanded version of this is also
    available for just Latin, Old Spanish, and Modern Spanish
    (http://mdavies.for.ilstu.edu/bible/span3.htm), and includes nearly the
    entire Bible).

    More important for the type of issues the committee is looking at, I have
    created a searchable, web-based corpus of 3,000,000 words of historical
    Spanish texts (1200s-1900s) (http://mdavies.for.ilstu.edu/corpus), and I
    will soon start work on a web-based 100,000,000 word corpus of historical
    Spanish, based in large part on other available electronic corpora, but
    with enhanced search features and tied in with other linguistic tools (word
    frequencies, dictionaries, bibliographical information, etc). In each
    case, the materials have been used by many researchers and students at
    other institutions.

    In the evaluation of materials such as these, the committee wants to know
    what the procedures and policies are at other institutions. For example:

    1a) In general terms, are materials that are not peer-reviewed at the
    outset (but rather are simple something that a researcher has created and
    puts on the web, and only later receives some type of external validation)
    considered for promotion and evaluation?

    1b) If so, at what level are they considered -- that of books, journal
    articles, book reviews, or potentially any of these levels, depending on
    the quality of the materials?

    2a) Since they are not peer-reviewed at the outset, is the faculty member
    expected to provide documentation to show how they have been used and
    accepted by peers at other institutions?

    2b) If so, what form would this documentation take -- logfiles showing the
    number of hits, email from many different users, comments from a selected
    set of peers, etc.

    3) Many of these materials would be used by both researchers _and_ students
    at other institutions -- probably much more than a journal article, which
    would be primarily used by other researchers. Therefore, how can one avoid
    "double-dipping", by including these materials in both the "scholarly" and
    the "teaching" categories, for those institutions that organize things
    thusly? In other words, would developers need to document and prove that
    one or the other groups (scholars / students) are the main users of the
    resource?

    I would very much appreciate your comments on any of these questions
    (mdavies@ilstu.edu). Although I will most likely just be summarizing the
    responses for presentation to the committee, please feel free to indicate
    if you would like your comments to be anonymous.

    Thanks in advance for your help.

    Mark Davies

    =======================================
    Mark Davies, Associate Professor, Spanish Linguistics
    Dept. of Foreign Languages, Illinois State University
    Normal, IL 61790-4300

    Voice:309/438-7975 email:mdavies@ilstu.edu
    Fax:309/438-8038 http://mdavies.for.ilstu.edu/personal/
    =======================================



    This archive was generated by hypermail 2b29 : Thu Apr 06 2000 - 00:18:07 MET DST