Hello Mark Davies,
I found your mail interesting. But I suspect its performance, both in
terms of speed and accuracy.
First, using DB is just another kind of representation of linear text
token. So the performance will be at most as good as a traditional tagger.
Second, although the DB operation can be somewhat efficient, it is
relatively hard to incorporate other resources and more powerful features
(such as Word N-grams, or a word segmenter in Chinese case) without special
treatment. Do you know any experiments like this being carried so far? I'm
glad to hear about them.
On Wed, 24 Sep 2003 13:17:55 -0600, Mark Davies <Mark_Davies@byu.edu>
wrote:
> Is anyone aware of projects in which relational databases have been used
> to do POS tagging? Rather than passing through a linear text token by
> token, it would all be done via adjacent rows in the database, using
> subqueries or JOINs. For example, you would have a table with N number
> of rows, where N = number of words in the corpus. Each row would have
> the following structure (lemma would probably be here as well):
> ...
-- Zhang Le Natural Language Processing Lab Northeastern University, P.R.China http://www.nlplab.cn/zhangle/
This archive was generated by hypermail 2b29 : Thu Sep 25 2003 - 03:57:18 MET DST