Re: [Corpora-List] POS tagging via relational databases

From: Zhang Le (ejoy@xinhuanet.com)
Date: Fri Sep 26 2003 - 03:30:51 MET DST

  • Next message: Information Retrieval: "[Corpora-List] DIR-2003: Final call for papers"

    Hello Mark Davies,
      I found your mail interesting. But I suspect its performance, both in
    terms of speed and accuracy.
    First, using DB is just another kind of representation of linear text
    token. So the performance will be at most as good as a traditional tagger.
    Second, although the DB operation can be somewhat efficient, it is
    relatively hard to incorporate other resources and more powerful features
    (such as Word N-grams, or a word segmenter in Chinese case) without special
    treatment. Do you know any experiments like this being carried so far? I'm
    glad to hear about them.
     On Wed, 24 Sep 2003 13:17:55 -0600, Mark Davies <Mark_Davies@byu.edu>
    wrote:

    > Is anyone aware of projects in which relational databases have been used
    > to do POS tagging? Rather than passing through a linear text token by
    > token, it would all be done via adjacent rows in the database, using
    > subqueries or JOINs. For example, you would have a table with N number
    > of rows, where N = number of words in the corpus. Each row would have
    > the following structure (lemma would probably be here as well):
    > ...

    -- 
    Zhang Le
    Natural Language Processing Lab
    Northeastern University, P.R.China
    http://www.nlplab.cn/zhangle/
    



    This archive was generated by hypermail 2b29 : Thu Sep 25 2003 - 03:57:18 MET DST