Re: T-Units

Ted E. Dunning (ted@aptex.com)
Wed, 23 Apr 1997 14:28:39 -0700

relative to the recent discussion of T-units, it is interesting to
note that something very much like a T-unit can be detected
statistically. in their work on machine translation by primarily
statistical means, the group at IBM (Mercer, Brown, Dellapietra^2 et
al) found that the cost to translate a sentence increased very rapidly
with increasing sentence length. to remedy this, they broke sentences
at what they call "rifts". rifts were detected by examining word
aligned translated sentences and looking for groups of words which
seemed to align as a unit as indicated by the fact that words from
other units did not intrude. based on this alignment data, they
developed a statistical rift detector.

my subjective impression is (was?) that the units between rifts looked
a lot like the T-units that bruce just defined.