Corpora: Re morphological/PoS ambiguity

Diana Sousa Marques Pinto Dos Santos (diana.santos@ilf.uio.no)
Tue, 10 Feb 1998 14:30:38 +0100 (MET)

On the subject of morphological/PoS ambiguity,
I use this opportunity to note that we have published some detailed
quantitative results on Portuguese (based on a small corpus) in
Medeiros et al. (1993) and Santos (1994). They are written in
Portuguese, though.

(see detailed reference in my publications page:
http://www.hf.uio.no/~dianasa/public.html
)

I should also note that Eckhard's Bick remark on imperative is a good
illustration of the dependence of the researchers' points of view on
the analysis - and results obtained.

In Portuguese, for forms other than the second person, the present
subjunctive is used for the imperative mode. In my analysis, there is
no _morphological_ ambiguity there. What there is is a syntactic
ambiguity: is the sentence in the imperative mode, or in the
subjunctive? That is generally easy to disambiguate in the context,
even though there are borderline cases.

One may of course say that the same is true for infinitive/present in
English, and these have received a different tag in any "PoS" tagger ever
written, but in English there are at least some verbs which have a
morphological paradigm which separates between the two, like the
verb to BE. There is _no_ such case in Portuguese, so why load the
morphology component with things that do not belong there?

I thought it was worth while sending this to the list, because it
looks like some people see a PoS tagger as a morphological
disambiguator. In my view, it is not (only). It is a shallow syntactic
analyser, and there's no reason why it cannot create information (put
tags) on words/tokens which does not come from information on the
lexical items themselves, but comes from the syntactic context
itself. (as would be the case of assigning imperative mode, for example)

If one insists that a PoS tagger must simply choose between the
alternatives which are already there, one has to put in the lexicon
(and call it morphology? syntactic features?) a lot of things which do
not necessarily belong there. (Maybe one should use a name other than
PoS tagger to the current taggers? Leech's "grammatical word tagging"
seems definitely better to me)

It seems to me that the "ambiguity of a language" as was discussed in
Eckhard Bick's is simply a function of the number of distinctions one
wants to draw in one's system, and not really any sensible
crosslinguistic comparison measure.

In any case, it was not by chance that when we devised Palavroso, a
morphological analyser for Portuguese, we did not assign imperative
and subjunctive homography to what we considered unambiguous
subjunctive forms.

Diana
------------------------------------------------------------------------
Diana Santos Tel: +47-22 85 71 10
The Text Laboratory E-mail: diana.santos@ilf.uio.no
Department of linguistics Fax: +47-22 85 69 19
University of Oslo http://www.uio.no/~dianasa/
P.O.box 1102 Blindern
N-0317 Oslo, Norway
http://www.hf.uio.no/tekstlab/
------------------------------------------------------------------------