Re: [Corpora-List] frequency lists: Hungarian

From: Viktor Tron (v.tron@ed.ac.uk)
Date: Fri Apr 09 2004 - 15:19:48 MET DST

  • Next message: FIDELHOLTZ DOOCHIN JAMES LAWRENCE: "[Corpora-List] Re: the 300 most frequent singular-plural pairs in German"

    Hello,

    As for Hungarian, I think I can help.
    For instance, I can send you a file with around 18.000 prefixless verb
    stems
    with the following fields:

    1. rank (of text frequency)
    2. frequency count in a web corpus of about 10 million tokens.
    3. verb stem <dictionary form, i.e., present tense 3sg indefinite-obj>
    4. same as 3?
    5. number of alternative stems (not very informative)
    6. number of different prefixes the stem occurs with
    7. number of suffixes (i.e., suffix clusters) the stem occured with
    8. orthographic family size: the number of all different verbal wordforms
        that are derived from this stem
            (any combination of added prefixes, suffixes, and capitalization patterns)

    If you need lists where different prefixed versions are not stripped,
    (this might make sense since different prefixed versions of the same
    alleged stem
    often have very different meanings) or more specific details, etc, just
    write to me.

    Disclaimer: the data and counts are obtained automatically and therefore
    the
    actual counts might be erroneous due to some systematic ambiguities.
    The basic pattern however I reckon, is reliable.

    If you use this data, please refer to the Szoszablya project
    www.szoszablya.hu

    Best
    Viktor Tron
    +------------------------------------------------------------------+
    |Viktor Tron v.tron@ed.ac.uk|
    |3fl Rm8 2 Buccleuch Pl EH8 9LW Edinburgh Tel +44 131 650 4414|
    |European Postgraduate College www.coli.uni-sb.de/egk|
    |School of Informatics www.informatics.ed.ac.uk|
    |Theoretical and Applied Linguistics www.ling.ed.ac.uk|
    | @ University of Edinburgh, UK www.ed.ac.uk|
    |Dept of Computational Linguistics www.coli.uni-sb.de|
    | @ Saarland University (Saarbruecken, Germany) www.uni-saarland.de|
    |use LINUX and FREE Software www.linux.org|
    +------------------------------------------------------------------+

    On Fri, 9 Apr 2004 14:04:56 +0200, Milena Slavcheva <milena@lml.bas.bg>
    wrote:

    > Dear Corpora List Members,
    >
    > I am looking for downloadable lists of frequently used verbs in:
    > - French;
    > - Hungarian;
    > - German.
    >
    > I would be grateful if you could provide me with information about such
    > resources.
    >
    > Best regards,
    >
    > Milena Slavcheva
    >
    > Milena Slavcheva
    >
    > Linguistic Modeling Laboratory
    > Institute for Parallel Processing
    > Bulgarian Academy of Sciences
    > 25A, Acad. G. Bonchev St.
    > 1113 Sofia, Bulgaria
    >
    > Phone: (+359 2) 979 2812
    > Fax: (+359 2) 70 72 73
    > E-mail: milena@lml.bas.bg



    This archive was generated by hypermail 2b29 : Fri Apr 09 2004 - 15:20:31 MET DST