RE: [Corpora-List] Tags in Word Smith

From: Lee, David (dvdlee@umich.edu)
Date: Mon Feb 17 2003 - 20:15:41 MET

  • Next message: Yuri Tambovtsev: "[Corpora-List] corpora for linguistic distances"

    If your POS tags are separated from the word by the underscore character (which makes things more readable than by using "%"), you can simply add the underscore character as an acceptable part of a 'word' by going to:

    Settings > Adjust Settings. Go to the "Text" tab and add "_" alongside the apostrophe which is already there.

    An important *second* step *if your POS tag set includes numbers (e.g. NN1, NN2)* is to then go to the "WordList" tab and activate the checkbox for "numbers included". Otherwise you will find yourself generating a wordlist without any singular (NN1) or plural nouns (NN2)... You may also want to increase the "word length" setting at the same time, since all 'words' are now longer than before, because of the included tag.

    Dave.

    ___________________________________________________
    David YW Lee
    dvdlee@umich.edu
    Research Fellow, MICASE project
    English Language Institute, University of Michigan
    TCF Building, 401 E. Liberty, Suite 350, Rm 3140
    Ann Arbor, Michigan 48104-2298, USA. Tel: +1 734-615-9638 (O)

    MICASE web site: http://www.lsa.umich.edu/eli/micase/micase.htm
    Corpus-based Linguistics web site: http://devoted.to/corpora
    ___________________________________________________

    > -----Original Message-----
    > From: Randall Jones [mailto:randall_jones@byu.edu]
    > Sent: Mon, February 17, 2003 12:37 PM
    > To: CORPORA
    > Subject: [Corpora-List] Tags in Word Smith
    >
    >
    > I have a question that I hesitate to ask because I'm sure the
    > answer is
    > obvious. I have a tagged German text. I want to run
    > WordList in Word
    > Smith Tools in a way that the tags will differentiate
    > homographs, e.g. sein
    > (verb and pronoun), da (adverb and conjunction), etc. I
    > would think that
    > because the words have different tags that they appear
    > differently in the
    > list. However, thus far I have been successful in ignoring
    > the tags or
    > having them treated as separate words. In both cases the
    > different uses of
    > sein etc. are grouped together.
    >
    > What am I doing wrong?
    >
    >
    > Randall L. Jones
    > Department of Germanic & Slavic Languages
    > Brigham Young University
    > Provo, Utah 84604 USA
    > randall_jones@byu.edu
    > http://humanities.byu.edu/faculty/JonesR.html
    >
    >
    >
    >
    >



    This archive was generated by hypermail 2b29 : Mon Feb 17 2003 - 20:20:44 MET