Corpora: word-formation systems

From: Anke Lüdeling (aluedeli@uos.de)
Date: Wed May 08 2002 - 14:52:55 MET DST

  • Next message: Rita Carol Simpson: "Corpora: Complete MICASE corpus available online"

    Dear list members,

    last week I asked for information about computational morphology
    systems that deal with word-formation. I received a number of helpful
    replies - thank you very much.

    I would like to thank the following colleagues

    Antti Aarppe
    Janne Bondi Johannessen
    Rodolfo Delmonte
    Sergei A. Koval
    Kemal Oflazer
    Dan Tufis
    Alexander S. Yeh

    Below I have summarized the responses I got plus the information on
    word-formation systems that I already had by language. I have given
    the url where available and commented where I could (I haven't yet had
    the time to look at all the links and papers that were provided but I
    will
    certainly try to do so).

    ---
    English
    

    ALE-RA http://nl.ijs.si/et/Thesis/ALE-RA/

    "Alexander S. Yeh" wrote: > UMLS (Unified Medical Language System?) is a U.S. Government program that > provides among other things, a free morphological variation system for > mainly English medical terms.

    ---- Finnish (& other languages)

    Antti Arppe wrote:

    > A Finnish language technology company, Lingsoft <www.lingsoft.fi> has > used their morphological models (based on the two-level principle and > model by Koskenniemi) for generating inflected word forms in > inflecting thesauri, i.e. synonym dictionaries that can handle the > inflected forms of the synonyms as well. The languages that were > covered are Finnish, Swedish, Norwegian (bokmål), Danish and German. > > There's a short presentation on part of this in the Proceedings of the > 17th Scandinavian Conference of Linguistics: Arppe, Antti; Voipio, > Mari; Würtz, Malene 2000. Creating Inflecting Electronic Thesauri. In > Lindberg, Carl-Erik & Nordahl Lund, Steffen 2000. 17th Scandinavian > Conference of Linguistics Odense Working Papers in Language and > Communication, No. 19, Vol. I, Institute of Language and > Communication, University of Southern Denmark. > > In the case of these software tools, the generation was geared for the > (limited) synonym content. In principle the same models could be > applied for the language as a whole, but there are a variety of > factors that have to be considered in such a case, e.g. variant > inflected forms and errors in the underlying linguistic model which > become apparent only when generation is applied. > > Though I have been talking here mostly about inflection, specifically > the Finnish model has had a version where both derivations and > inflections can be generated from root words, e.g. > > ympäri+dv-oida+dn-minen+nom+sg > ympäröiminen > around+verbalize+nominalize+nominative+singular > encirclement > > I believe that this could be adapted rather easily to the other > languages as well, since they're all based on the same theoretical > principle, i.e. the TWOL model which allows to be used for both > morphological analysis and generation. Nevertheless, Lingsoft has not > been otherwise very active regarding these tools, as far as I know.

    Comment: I am familiar with GerTWOL, the German version of TWOL. A link is given below.

    ---- German

    DeKo (for Derivation und Komposition, IMS, University of Stuttgart; this is the project I worked in :-) http://www.ims.uni-stuttgart.de/projekte/DeKo

    Projekt Deutscher Wortschatz (University of Leipzig): http://wortschatz.uni-leipzig.de

    Deutsche Malaga Morphologie (university of Erlangen): http://www.linguistik.uni-erlangen.de/~orlorenz/DMM/DMM.html

    CISLEX (University of Munich): http://www.cis.uni-,uenchen.de/projects/CISLEX:html

    GerTWOL (Lingsoft Inc.): http://www.lingsoft.fi/cgi-bin/gertwol

    and there is a German version of WordManager (University of Basel & Canoo) http://www.wordmanager.com

    --- Italian

    Rodolfo Delmonte wrote: > > As to the morphology word formation system, of course we have our > system for Italian IMMORTALE) that generates/analyses derivations > besides inflections. But no compound word, at least not yet. Even > though we could regard cliticized verbs as a special type of compound > word, > - lasciamoglielo / (let's) leave it to him > it requires clitic stripping and then inflection stripping, perhaps > with derivation stripping too, in case the verb is not included in > the dictionary list. > There's a number of published papers on it, they are listed in my website. > website: http://project.cgm.unive.it

    --- Norwegian

    Janne Bondi Johannessen wrote:

    > For Norwegian, we have a compound analyser that also analyses > productive derivation as part of our morphological tagger. It can be > tested at : http://decentius.hit.uib.no:8005/cl/cgp/test.html

    --- Romanian

    Dan Tufis wrote: > > For Romanian I can give you at least three examples: > 1) Dan Cristea's morphological analyser/generator in the early 1980's > 2) my PARADIGM morphology learning system > (described in the EACL89 proceedings: "Tufis D. "It Would Be Much Easier If > WENT Were GOED", > in Harry Somers, Mary McGee Wood (eds.), Proceedings of the 4th EACL, > Manchester, 1989, pp.145-152 > and in EACL91: Tufis D., Popescu O., "A Unified Management and Processing of > Word-Forms, Idioms and Analytical Compounds", in Jurgen Kunze and Dorothy > Reinman (eds.), Proceedings of the 5th EACL, Berlin, 1991, pp.95-100) > 2) Dan Cristea's MICH classification-based system > (described in Dan Cristea (1994): The Classification Language MICH, Research > Report, LIMSI-CNRS, Universite Paris-Sud, Orsay. > Dan Cristea (1993): The generation of Romanian Morphology. Research Report. > University of Edinburgh). > > There is a new C-based PC-implementation of the LISP system 1) due to Stefan > Andrei of University A.I. Cuza in Iasi > (described in Andrei, St.: A Morphological Analyser for Romanian Language. > The First EUROLAN Summer School > in Natural Language Processing , Iasi - Romania, July 19-29, 1993)

    --- Russian

    "Sergei A. Koval" wrote: > > As for Russian, there is a system called RUSLO (abbreviated from the Russian > "RUSskoye SLOvoobrazovaniye" = "Russian Derivation") developed by > N.N.Pertsova, A.V.Cheremkhin, A.V.Rafaeva. > Some details are available at > http://194.226.57.46/uvk1838/Sciper/volume1/pertsova.htm

    --- Turkish

    Kemal Oflazer wrote: > > You may want to take a look at the morphological analyzer for Turkish > reachable from http://www.sabanciuniv.edu/fens/people/oflazer/

    I have tried this one out - it seems to do quite a lot, it is especially interesting since it treats both word formation and inflection.

    ---

    Multilingual

    Word-Manager (German, English, Italian, ...)

    ---

    More general information about morphology systems (dealing mostly with inflection) can be found

    http://www.sil.org/computing/comp-morph-phon.html http://www.xrce.xerox.com/competencies/content-analysis/fsnlp/morph.en.html

    -- Dr. Anke Lüdeling Institut für Kognitionswissenschaft, Universität Osnabrück Katharinenstr. 24, 49069 Osnabrück, Germany phone: +49-541-9694073 fax: +49-541-9696210 homepage: http://www.cogsci.uni-osnabrueck.de/~aluedeli



    This archive was generated by hypermail 2b29 : Wed May 08 2002 - 14:59:45 MET DST