Corpora: Q: what's behind Part-of-speech (II)?

Ji Donghong (dhji@krdl.org.sg)
Wed, 29 Apr 1998 13:36:46 +0800 (SST)

[ Apologise for duplicating posting ]

Several days ago, I posed a query "what's behind part-of-speech?", up
to now, more than 10 researchers have replied me. Now I would like to pose
another query on the topic before presenting a summarization:

Q: Is the part-of-speech based on syntactic distribution a
WELL-FORMED concept?

where I mean "WELL-FORMED" by the following condition 1) and 2):


1) For a particular language, we can select, based on some resonable
principles, some distribution information among all as criteria
for POS definition.

/* NOTE to 1): For a particular language, there may be too much
syntactic distribution information, and we cannot
list all. */

2) We can select, based on some principles, some among all possibly
produced classes as part-of-speech categories.

/* NOTE to 2): Given a fixed set of distribution information as
constraints, there will produce many many classes based
on the constraints, or combinations of the constraints.
For example, given only two constraints, c1 and c2, we
may get the following conditions, with one condition
corresponding with one class:

c1, c2, c1 AND c2, c1 OR c2, NOT_c1, NOT_c2, ...

It seems difficult that we don't think they are classes
based on distribution. */

THE AIM OF THIS QUERY:

1) if the answer to the query is YES, we should face the difficulties
in Chinese POS definition, and try to find what really lead to
them.

2) if the answer to the query is NO, we should not seek POS based on
distribution again, especially for the languages with few affixes,
e.g., Chinese, and turn to other criteria.

Any comments or information will be highly appreciated. I will present a
suumarization.

Ji Donghong

-----------------------
Kent Ridge Digital Labs
Singapore 119613
Email: dhji@krdl.org.sg
Tel: 65-8746380
Fax: 65-7744998
-----------------------