Re: Corpora: generalisation in text

(no name) ((no email))
Thu, 10 Sep 1998 09:28:50 +0200

>Hello,
>
>Does anyone have any ideas as to how to measure and
>compare generalisations in text corpora? Working and conducting
>research in the advanced EFL context (university level), I would very
>much appreciate having a set of simple corpus-analysis measurements
>to help me automatically identify student texts that ramble on as
>opposed to those which develop ideas in a more proper in-depth way.
>My original plan was to use the WordNet system to tag my data
>semantically and (mainly) compare statistics of hypernym/hyponym
>depth in the nouns found in the corpora, but it does not seem to
>be working very well plus it is very straining since I do not have
>access to any semantic tagging software.
>
>I will very much welcome any ideas, suggestions or pointers that
>could help develop/refine/improve my approach. Also, if you
>happen to know of any semantic tagging systems available to the
>public or for research, I'd be grateful for a tip.
>

Hi

I suggest three approaches to the problem of guessing the depth of
ideas in a text, but I can not guess which one you refer to. The ap-
proaches are:

1. To suppose that the deeper the synsets occur in the pseudo-
hierarchy, the deeper the text is (a criterion of specialization).

2. To suppose that the more shallow the synsets occur in the pseudo-
hierarchy, the deepest the text is (a criterion of abstraction -- a
text that deals with abstract and complex topics).

3. To suppose that the biggest the difference is between deep and
shallow concepts, the deepest the text is (an intermediate criterion).

My next cooments can be applied to any one of these criteria:

work vey well. The problem with WordNet is the variable depth of the
hierarchy. There are very deep branches (around 15 links) for specia-
lized topics like biology (the taxonomy of animal Geni):

1 sense of welsh springer spaniel

Sense 1
Welsh springer spaniel
=> springer spaniel
=> spaniel
=> sporting dog, gun dog
=> hunting dog
=> dog, domestic dog, Canis familiaris
=> canine, canid
=> carnivore
=> placental, placental mammal, eutherian, eutherian mammal
=> mammal
=> vertebrate, craniate
=> chordate
=> animal, animate being, beast, brute, creature, fauna
=> life form, organism, being, living thing
=> entity, something

Nevertheless, other but not less complex or specialized topics are repre-
sented by more flat branches (psicological features):

3 senses of ego

(...)

Sense 3
ego
=> mind, head, brain, psyche, nous
=> cognition, knowledge
=> psychological feature

So I think that depth in the pseudo-hierarchy is not a good indicator of
the complexity of the topics in a text.

I hope these comments will be useful. Regards

José María

_____________________________________________________________________________

Jose Maria Gomez Hidalgo
Departamento de Inteligencia Artificial
Universidad Europea de Madrid - CEES
28670 - Villaviciosa de Odon - MADRID Tfno: (91) 616 94 00 Ext. 670
e-mail: jmgomez@dinar.esi.uem.es WWW: http://www.esi.uem.es/~jmgomez/
_____________________________________________________________________________