Re: [Corpora-List] POS tagging via relational databases

From: Zhang Le (ejoy@xinhuanet.com)
Date: Fri Sep 26 2003 - 03:30:51 MET DST

Next message: Information Retrieval: "[Corpora-List] DIR-2003: Final call for papers"

Previous message: Mark Davies: "[Corpora-List] POS tagging via relational databases"
In reply to: Mark Davies: "[Corpora-List] POS tagging via relational databases"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hello Mark Davies,
I found your mail interesting. But I suspect its performance, both in
terms of speed and accuracy.
First, using DB is just another kind of representation of linear text
token. So the performance will be at most as good as a traditional tagger.
Second, although the DB operation can be somewhat efficient, it is
relatively hard to incorporate other resources and more powerful features
(such as Word N-grams, or a word segmenter in Chinese case) without special
treatment. Do you know any experiments like this being carried so far? I'm
glad to hear about them.
On Wed, 24 Sep 2003 13:17:55 -0600, Mark Davies <Mark_Davies@byu.edu>
wrote:

> Is anyone aware of projects in which relational databases have been used
> to do POS tagging? Rather than passing through a linear text token by
> token, it would all be done via adjacent rows in the database, using
> subqueries or JOINs. For example, you would have a table with N number
> of rows, where N = number of words in the corpus. Each row would have
> the following structure (lemma would probably be here as well):
> ...

-- 
Zhang Le
Natural Language Processing Lab
Northeastern University, P.R.China
http://www.nlplab.cn/zhangle/

Next message: Information Retrieval: "[Corpora-List] DIR-2003: Final call for papers"
Previous message: Mark Davies: "[Corpora-List] POS tagging via relational databases"
In reply to: Mark Davies: "[Corpora-List] POS tagging via relational databases"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Thu Sep 25 2003 - 03:57:18 MET DST