RE:Corpora:Corpus Analysis & Discourse Markers

Sarah Oates (Sarah.Oates@itri.brighton.ac.uk)
Thu, 22 Jul 1999 12:43:32 +0100

--------------915CCE59654D2ADA736471A0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

A SOLUTION HAS BEEN FOUND !!

To summarise, the problem I was having was that I needed a corpus which
was tagged to distinguish between discourse markers and other parts of
speech. I needed to be able to submit a query for a particular word
functioning as a discourse marker and only get sentences like (a).
Using the BNC, my search results were bringing back sentences like (a)
and (b) and I was having to search again manually to find examples of
discourse markers.

(a) I wanted to be funny SO I started telling jokes. (discourse marker
of 'cause')
(b) He is SO funny (adverb not discourse marker)

(a) I decided to do linguistics SINCE I am interested in languages and
how we use them. (discourse marker of 'justification')
(b) I have been doing this PhD SINCE the beginning of October. (temporal

adverb not discourse marker)

(a) The party will go ahead PROVIDED THAT Mary brings the food.
(discourse
marker of 'condition')
(b) Mary PROVIDED THAT food for the party. (verb+determiner not
discourse
marker)

I've actually managed to find a corpus which is tagged for discourse
markers. It's called
ICE-GB. It not only distinguishes between discourse markers and other
parts of speech, but also distinguishes between different types of
discourse marker, for example whether they are a subordinator or
coordinator or if they can be classed as an interjection or a connective
(amongst many other things).

Anyone that may be interested in the corpus can find it at:

http://www.ucl.ac.uk/english-usage/ice-gb/index.htm

Thanks for all your responses, they were very helpful.

Sarah

--------------915CCE59654D2ADA736471A0
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
A SOLUTION HAS BEEN FOUND !!

To summarise, the problem I was having was that I needed a corpus which was tagged to distinguish between discourse markers and other parts of speech.  I needed to be able to submit a query for  a particular word functioning as a discourse marker and  only get sentences like (a). Using the BNC, my search results were bringing back sentences like (a) and (b) and I was having to search again manually to find examples of discourse markers.

(a) I wanted to be funny SO I started telling jokes.   (discourse marker of 'cause')
(b) He is SO funny   (adverb not discourse marker)

(a) I decided to do linguistics SINCE I am interested in languages and how we use them.  (discourse marker of 'justification')
(b) I have been doing this PhD SINCE the beginning of October. (temporal
adverb not discourse marker)

(a) The party will go ahead PROVIDED THAT Mary brings the food.   (discourse
marker of 'condition')
(b) Mary PROVIDED THAT food for the party.   (verb+determiner not discourse
marker)

I've actually managed to find a corpus which is tagged for discourse markers. It's called
ICE-GB. It not only distinguishes between discourse markers and other parts of speech, but also distinguishes between different types of discourse marker, for example whether they are a subordinator or coordinator or if they can be classed as an interjection or a connective (amongst many other things).

Anyone that may be interested in the corpus can find it at:

http://www.ucl.ac.uk/english-usage/ice-gb/index.htm

Thanks for all your responses, they were very helpful.

Sarah
 
  --------------915CCE59654D2ADA736471A0--