Re: [Corpora-List] Re. Concordancer for Chinese (Summary of reply)

From: Mike Scott (mike@lexically.net)
Date: Mon Oct 07 2002 - 13:00:47 MET DST

  • Next message: Magali Jeanmaire: "[Corpora-List] ELRA news"

    As I understand it from Chinese CL linguists such as Scott Piao,
    determining word boundaries in Chinese (and some other languages) is a
    highly complex matter. The strategy I am using in WordSmith Tools version 4
    is threefold:

    a) assume that text in such languages has been pre-processed to insert
    suitable word-boundary markers,
    and where this has not been done,
    b) allow the user to specify a list of common sequences for pre-processing
    by WordSmith (inserting suitable word-boundary markers)
    c) failing this, to equate "word" and "character".

    Cheers -- Mike

    At 17:15 07/10/2002 +0800, Linda Lin wrote:
    >Dear All
    >
    >Thanks for your information about the concordancers for Chinese language. I
    >have a question regarding the use of these concordancers. Do you think the
    >recommended concordancers such as MonoConc Pro can only recognize individual
    >characters, not actual "words" i.e. strings of characters, or they can in
    >fact process actual "words"?
    >

    Mike Scott

    Applied English Language Studies Unit
    University of Liverpool
    Liverpool L69 3BX, UK.

    mike.scott@liv.ac.uk
    http://www.lexically.net
    http://www.liv.ac.uk/~ms2928



    This archive was generated by hypermail 2b29 : Mon Oct 07 2002 - 21:01:26 MET DST