Re: Corpora: Frequency Meaning

From: James L. Fidelholtz (jfidel@siu.buap.mx)
Date: Sun Feb 20 2000 - 05:00:29 MET

  • Next message: Russell Collingham: "Corpora: Exciting Job Opportunities in NLE"

    On Thu, 17 Feb 2000, Pascual Cantos wrote:

    >I was recently wondering about the usefulness of using frequency data in
    >order to classify types/lemmas in various frequency layers, say:
    >
    > - Very Low
    > - Low
    > - Moderate
    > - High
    > - Very High
    >
    >What criteriae would you suggest to carry out this?

    Dear Pascual:
            I don't have any very recent info for you, but I did publish an
    article in 1976 on English vowel reduction, which contains some
    suggestive data for part of your question (at least for English,
    although I would have to be convinced that frequency phenomena are
    significantly different in this regard for different languages). Now,
    there is a pretty clear dividing line at about 4/M (plus or minus about
    3/M) between words with reduced vowels in certain environments, and
    vowels unreduced in those environments (of course, the more frequent
    words show a greater tendency toward reduction). It seems to me that
    this would probably correspond to the difference between 'medium' and
    'low', but a lot depends on how you define these categories. Here, the
    evidence is overwhelmingly strong, in my opinion. There is some fairly
    weak evidence (from other environments with relatively few examples) for
    another dividing line somewhere around 35-50/M, which might correspond
    to the 'moderate'/'high' division, although my feelings are less strong
    on various aspects of this decision.
            No doubt others will have different ideas on what these
    differences correspond to, based on totally different analyzed data, but
    maybe we can get at some consensus about what these categories (or a
    smaller number of categories, perhaps) might correspond to
    psychologically. This last word is important, as there seem to exist
    various factors which may make a relatively infrequent word
    psychologically more salient, or vice versa (eg, 'berserk' is actually
    almost never encountered in the earlier, pre-computer word counts
    [corpora of a few hundred Kwords to about 18 Mwords], and nevertheless
    acts phonologically in some ways like a 'medium' frequent word--there is
    something about its phonological shape [apparently] which makes it
    extremely salient for English speakers.
            By the way, there is also some evidence in the article which
    calls into question whether, in at least some cases, nonautomatic
    morphophonemic alternation may produce distinct lexical entries, for at
    least some effects (specifically, the first vowel in the verb 'mistake'
    reduces, but the past tense 'mistook' usually has the first vowel
    unreduced, since the two forms fall on opposite sides of the
    'familiar/unfamiliar' frequency dividing line). It is data like these
    that make me interested in frequency counts of forms rather than
    lexemes.

            The article reference is as follows:
    Fidelholtz, James L. 1975. Word frequency and vowel reduction in
    English. _Chicago linguistic society. Regional meeting. Papers_
    11.200-213.
            At some point in the future, there will be an electronic version
    of this article available on the Web, but I can't promise when. I will
    let you know when it is available.
            Jim

    James L. Fidelholtz e-mail: jfidel@siu.buap.mx
    Maestría en Ciencias del Lenguaje
    Instituto de Ciencias Sociales y Humanidades
    Benemérita Universidad Autónoma de Puebla, MÉXICO



    This archive was generated by hypermail 2b29 : Sun Feb 20 2000 - 05:00:25 MET