[Corpora-List] QTAG tag assignment problem

From: Tony Berber Sardinha (tony4@uol.com.br)
Date: Wed Jun 18 2003 - 13:35:12 MET DST

  • Next message: peetm: "[Corpora-List] Subcat Questions"

    Dear list members

    I wonder if anyone could help me with a QTAG tagging problem (wrong tag
    assignment). I'm using a Portuguese language model based on 500K words of tagged
    data.

    For example, it tagged the Portuguese preposition 'de' as:

    de_CJ

    The correct output would be 'de_PRP'. There is not a single occurrence of
    'de_CJ' in the training corpus.

    The possibilities given by the tagger (with the '-f ac' option) are:

    de : CJ [28:0.0] IN [1:0.0] PRP [13834:0.0]

    This shows the PRP tag as the most likely one by far (even though 13834 does not
    correspond to the training corpus frequency of 19886).

    The tag most frequently assigned in error is N (noun).

    The frequency of tags in the training corpus is:

     110614 N
      61415 PT
      60166 V
      45531 PRP
      44735 ART
      35132 CPR
      27563 PROP
      20933 ADJ
      19281 CJ
      17530 PRN
      16506 ADV
       5258 NUM
        665 DESC
         94 IN

    cheers
    tony.
    -------------------------------------
    Dr Tony Berber Sardinha
    LAEL, PUC/SP
    (Catholic University of Sao Paulo, Brazil)
    tony4@uol.com.br
    http://lael.pucsp.br/~tony
    [New website]



    This archive was generated by hypermail 2b29 : Wed Jun 18 2003 - 13:46:09 MET DST