Re: [Corpora-List] Tags in Word Smith

From: Mike Scott (mike@lexically.net)
Date: Mon Feb 17 2003 - 19:11:10 MET

  • Next message: Mark Davies: "RE: [Corpora-List] Tags in Word Smith"

    Randall Jones wrote:
    I have a question that I hesitate to ask because I'm sure the answer is
    obvious. I have a tagged German text. I want to run WordList in Word
    Smith Tools in a way that the tags will differentiate homographs, e.g. sein
    (verb and pronoun), da (adverb and conjunction), etc. I would think that
    because the words have different tags that they appear differently in the
    list. However, thus far I have been successful in ignoring the tags or
    having them treated as separate words. In both cases the different uses of
    sein etc. are grouped together.

    What am I doing wrong?

    ***********************

    There should be an obvious solution but there isn't, I'm afraid
    In WordSmith 3.0, a way to solve this problem is to ensure your tags can be
    seen as part of the "word". As you will know, the apostrophe is by default,
    for English, included in a word as an "acceptable mid-word character" so to
    speak. If your text were tagged like this you'd get the results you want:

    John'PROPERNOUN is'VERB on'PREP the'DET john'NOUN

    You could also set another symbol as an acceptable mid-word character, say %
    John%PROPERNOUN is%VERB on%PREP the%DET john&NOUN

    (I haven't tested this but it *should* work. Test on a small text first,
    then if OK, you could make a copy of your corpus and use Text Converter to
    make the changes.)

    In WS4 (emerging blinking into the daylight from a long dark tunnel) I will
    think of a neater way than this of working! Am still refining tag treatment
    so this query came at a good moment.

    Mike Scott

    Applied English Language Studies Unit
    University of Liverpool
    Liverpool L69 3BX, UK.

    Mike.Scott@liv.ac.uk
    http://www.lexically.net
    http://www.liv.ac.uk/~ms2928



    This archive was generated by hypermail 2b29 : Mon Feb 17 2003 - 19:14:07 MET