Corpora: Corpus Analysis of Hypertext - Results

Einat Amitay (einat@mpce.mq.edu.au)
Mon, 30 Mar 1998 17:57:06 +1000

Hello

Almost a year ago I've posted a request to this list asking people to
mail me the URL of their homepages in order to create a corpus of
manually authored HTML files.

The results of the corpus analysis are found in my MSc dissertation
(http://www.mri.mq.edu.au/~einat) and you are welcome to have a look at
them. I think many of you who work with hypertext might find this study
interesting and I’ll appreciate any comments since we are about to
publish the results.

The abstract is below and thank you very much for your help!
einat

--
Einat Amitay
einat@mri.mq.edu.au
http://www.mri.mq.edu.au/~einat

-------------------------------------------------------------------------------------------------

ABSTRACT

Dillon et al. (1993) observed, when the hypertext authoring on the web
was just beginning to become popular in the non-academic world, that
there is a problem of schemata, or genre conception, in hypertext,
because of the flexible nature of language and the varied layout used in
its creation. Today, almost five years later, the web is used by many
people and there are conventions which evolved from usage and
experience. In the years that passed since then, users became aware of
the existence of other users by interacting with their hypertext
documents and by creating their own homepages. Through analysing two
corpora consisting 1000 HTML files retrieved from the World Wide Web,
this study describes the linguistic conventions with which hypertext
documents are being written. It is claimed here that hypertext is a new
linguistic genre and that it should be treated as such in future
studies. It is also suggested in this dissertation that studying these
conventions and applying the gained knowledge to existing academic work,
would be beneficial to both hypertext users and the research community.
--------------------------------------------------------------------------------------------------