Corpora: Help on Frequencies

From: Pascual Cantos (pcantos@fcu.um.es)
Date: Fri Oct 05 2001 - 11:40:59 MET DST

  • Next message: Willard McCarty: "Corpora: CFP: ALLC/ACH 2002"

    Dear List Members,

    Many corpus-based applications on foreign language materials and dictionary
    making, among other, mostly rely on raw frequencies (absolute and/or
    relative frequencies) of word forms, lemmas, bi-grams, etc. Frequencies
    indices are taken into account in order to decide whether an item should be
    considered or not.

    And here are my doubts:
    What do frequencies exactly tell?
    And more interesting, what do they hide?
    How misleading/erroneous can they be?
    How far can we rely on them?
    What other features/aspects/measures should also be considered?
    Are there ways/techniques to "correct" frequencies indices, statistically?

    I would most appreciate ideas, comments and literature on this issue.
    I do also promise to send a summary of all mails received.

    Un saludo y un millón de gracias

    Pascual

    -----------------------------------------------------
    Dr. Pascual Cantos Gómez

    Departamento de Filología Inglesa
    Universidad de Murcia
    C/. Santo Cristo, 1
    30071 Murcia (Spain)

    Tel.: +34 968 364365
    Fax: +34 968 363185
    E-mail: pcantos@fcu.um.es
    http://www.um.es/lacell/miembros/pcg/



    This archive was generated by hypermail 2b29 : Fri Oct 05 2001 - 16:39:54 MET DST