FW: Corpora: Morphology and Word Length (was: Relatve text length)

From: Tadeusz Piotrowski (tadpiotr@plusnet.pl)
Date: Fri Apr 26 2002 - 20:43:31 MET DST

  • Next message: Mike Maxwell: "Re: Corpora: Morphology and Word Length (was: Relatve text length)"

    Is there really any language-independent morphology? I doubt it, and I
    recall that even for one language there are conficting views on
    morphology, i.e. a word has as many morphemes as the theory allows it.
    Tadeusz Piotrowski

    > -----Original Message-----
    > From: owner-corpora@lists.uib.no
    > [mailto:owner-corpora@lists.uib.no] On Behalf Of Mike Maxwell
    > Sent: Friday, April 26, 2002 3:37 PM
    > To: corpora@lists.uib.no
    > Subject: Corpora: Morphology and Word Length (was: Relatve
    > text length)
    > Damlon Davison writes:
    > >It may be obvious, but agglutinating languages
    > >tend to have longer words
    > --or at least the _average_ length of words in agglutinating
    > languages tends to be longer, which presumably is what is
    > meant here. Languages like English that have substantial
    > derivational morphology can have some long words, but a
    > glance at a text in an agglutinating language like Quechua
    > will show the difference in average length.
    > I suspect polysynthetic languages also have long word
    > lengths, but whether that's true on the average, or only of
    > some words (verbs with incorporated nouns, say), I don't
    > know. I've never looked at an extended text in such a
    > language. And of course compounding can create long words
    > (look at a German text), and perhaps reduplication in
    > languages that use whole-word reduplication.
    > I suspect that another influence on word length is the
    > phonology: words with large phoneme inventories tend to have
    > shorter words. Does anyone have data on this? E.g.
    > languages with large numbers of consonants (the Caucasus
    > region?), or languages with lots of tones (some Chinese
    > languages--in Romanized scripts, of course!, or Chinantec
    > languages (Mexico)), as opposed to languages like Hawai'ian,
    > which is notorious for a small phoneme inventory (around 13,
    > as I recall) and long words.
    > Since there are at least two factors related to word length
    > (morphology and phonology), and several different factors
    > within morphology, I wonder whether anyone has experimented
    > with automatic classification of morphological type. We're
    > having a workshop at the ACL this summer on morphology
    > learning, but it ought to be able to get a rough idea of how
    > many affixes there are without learning the "entire"
    > morphology. Perhaps just seeing how compressible a text is
    > would give you some idea, or turning it into a minimized FSA.
    > Finally, there is a big caveat: the length of a word depends
    > very much on orthographic decisions. Are clitics written
    > solid? Compounds?
    > Written German has long 'words' because the compound nouns
    > are written solid. If they were written with a space between
    > the nouns, the word length would become a lot shorter--not to
    > mention how much easier it would be to read. I guess the
    > original observation on this is by Mark Twain :-).
    > I have even heard of a language where the linguist who
    > designed the orthography decided to write a space between
    > each morpheme, turning an agglutinating language into an
    > isolating language in the orthography! (One wonders how the
    > written language will look after a generation or two.)
    > Mike Maxwell
    > Linguistic Data Consortium
    > maxwell@ldc.upenn.edu

    This archive was generated by hypermail 2b29 : Fri Apr 26 2002 - 20:41:13 MET DST