sum:frequency

Marcial.Terradez@uv.es
Mon, 16 Jun 1997 11:15:56 +0100

Some weeks ago, I made a query on linguist list about frequency vocabularie=
s on=20

English, French, German and Spanish. Many people responded with helpful=20
comments, which are summarised below.
Thanks to everybody who writed to me. Your suggestions and information are =
very=20
important for my work.

My name is Erik Willis and I attend Brigham Young University as a Masters
student in Spanish. One of our professors is very active in frequency
counts, his name is Orlando Alba. (Orlando_Alba@byu.edu) I know his teache=
r
Humberto Lopez Morales was very active in that field also. Their respectiv=
e
corpora are based on the Dom. Rep. Puerto Rico and I believe Mexico and
were based on availability (lexico disponible). Hasta ahora no creo que
tengan algo en el net. El que mejor conoce los recursos del net el
Francisco Marcos Marin en la Autonoma de Madrid. No tengo su e-mail.
I am also working with frequency counts but at a phonological level. I am
looking at written and oral narratives which I believe has not been done.
Ojala podamos ayudarnos mutuamente con bibliografias etc.
Erik Willis
willisew@itsnet.com
---------------------------------------------------
Estimado Marcial:
=09Hay varios recuentos existentes ya, entre ellos:
=09Helen Eaton, ca. 194?. (Me olvido del ti'tulo, pero es algo=20
como: Frecuency counts in 5 European languages. No se' quie'n lo=20
publico' originariamente, pero la Dover Press lo volvio' a publicar en=20
'paperback' por eso de los 60s o 70s.
=09Luis Fernando Lara en el Colegio de Me'xico ha hecho mucho en=20
este sentido (con base en textos seleccionados de un total de [creo] 2=20
millones de palabras de texto corrido). E'l esta' en el DEM [diccionario=
=20
del espan~ol de Me'xico], y actualmente es el director del CELL [centro=20
de estudios de lingu"i'stica y literatura] de El Colegio de Me'xico=20
(e-mail: lara@colmex.mx, aunque no estoy 100% seguro del prefijo). E'l=20
te puede asesorar mucho al respecto. Tb. hay muchos investigadores del=20
ana'lisis de corpus en la propia Espan~a, aunque no me acuerdo en este=20
momento de sus nombres.
=09Yo a mediano plazo emprendere' un proyecto con propo'sito=20
similar, pero con un corpus de gigapalabras, para poder investigar el uso=
=20
de formas de palabras (por ej., el futuro del subjuntivo, etc.) con algo=20
de detalle, asi' como los nombres propios, etc. Sin embargo, no tengo=20
mucho hecho al respecto hasta la fecha.
=09
Jim
James L. Fidelholtz=09=09=09e-mail: jfidel@siu.cen.buap.mx
A'rea de Ciencias del Lenguaje=09=09o:=09jfidel@cca.pue.udlap.mx
Instituto de Ciencias Sociales y Humanidades
Universidad Auto'noma de Puebla, Me'xico
---------------------------------------------------------------
Estimado Marcial,
Un colega mio de la Universidad de Oviedo acaba de publicar un diccionario
de frecuencias del castellano. Su direccion es:
Jose Ramon Alameda <jalameda@sci.cpd.uniovi.es>
En cuanto al diccionario que Ud. va a recopilar, Ud. piensa etiquetear las
palabras. Es decir, va a distinguir entre en numero de casos de 'casa'
que son del sustantivo 'casa' y los que viene del verbo 'casar'?
----------------------------------------------------
David Eddington
Mississippi State University
I used two frequency lists in research I conducted almost 20 years ago:
one is the Keniston List, 2000 words divided into groups of 500 for
frequency of words in print in Peninsular Spanish. The other is Rodriquez
and Bou for frequency of words in print for Puerto Rican Spanish.
Joel Walters
Department of English
Bar-Ilan University
Ramat Gan, Israel
------------------------------------------------------
I produced the frequency list for Longman's Dictionary. Both the
paper and assorted frequency lists are available from my web page
(see below).=20

If you have troubel accessign the paper, feel free to email me again
and I'll send it,

=09Happy surfing,

=09=09Adam

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Adam Kilgarriff =20
Senior Research Fellow tel: (44) 1273 642919 =20
Information Technology Research Institute (44) 1273 642900=20
University of Brighton fax: (44) 1273 642908
Lewes Road =20
Brighton BN2 4GJ email: Adam.Kilgarriff@itri.bton.ac.uk
UK http://www.itri.bton.ac.uk/~Adam.Kilgarriff
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
---------------------------------------------------------------------
Entra por ftp anonimo en ftp-lsi.upc.es
cambia al directorio pub/lluisp

alli encontraras los ficheros

spanish.freq (frecuencias de palabras en espa=F1ol
=09=09=09sacadas de un corpus de 3M de palabras)

wsj.freq (frecuencias de palabras en ingles sacadas
de 1.1M de palabras del WSJ)

tienes que uudecodear y gzunzipar los ficheros

=09suerte

=09=09Lluis Padro
-------------------------------------------------
Hola Marcial:

Aunque es muy probable que ya las tengas, te envio las referencias
que tengo a mano sobre frecuencias lexicas del castellano, por si
te pueden ayudar:

PATTERSON, William; y URRUTIBEHEITY, Hector, _The Lexical Structure of Span=
ish_,
Mouton, La Haya-Par=EDs, 1975.=20

JUILLAND, Alphonse; y CHANG-RODRIGUEZ, Eugenio, _Frequency dictionary of Sp=
anish
words_, Mouton, Londres-La Haya-Par=EDs, 1964.

PATTERSON, William T., "On the genealogical structure of the Spanish
vocabulary",=20
en ???, pp. 309-339.

GARCIA HOZ, Victor, _Estudios experimentales sobre el vocabulario_, CSIC,
Madrid, 1977.=20

______________________
Javier Gomez Guinovart <uvifejgg@cesga.es>
http://www.uvigo.es/departamentos/dep/h06/webh06/sli/index.html
Univ. de Vigo - Fac. de Humanidades - Apartado 874 - E-36200 Vigo
Tel: +34+86+812360 - Fax: +34+86+812380
-----------------------------------------------------------
I have a copy of:

An English-French-German-Spanish Word Frequency Dictionary
Subtitle: A correlation of the first 6000 words in four single-language
frequency lists
Compiled by Helen S. Eaton, Teachers College, Columbia Univ;
visiting instructor, Univ of New Mexico;
Diplomee, Sorbonne, Universite de Paris
441 pages, paperback, Dover Publications, Inc, New York.
copyright 1940, 1967 by Helen S. Eaton

There are separate indexes for English, French, German and Spanish
words.
Appendix II is a conceptual analysis of substantives, verbs and
adjectives
in the list.

Pub in Canada by General Publ Co Ltd, 30 Lesmill Road, Don Mills,
Toronto, Ontario
Pub in UK by Constable and Co, Ltd, 10 Orange St, London, W.C. 2
Pub in US by Dover Publications Inc, 180 Varick St, New York, NY 10014
LCCN: 61-4487

/s/ Israel Cohen
New Dimension Software Ltd
izzy@telaviv.ndsoft.com