Parallel corpora

A.M.Dickens@bton.ac.uk
Thu, 4 May 1995 17:33:39 GMT

Dear colleague,

Derek Lewis <d.r.lewis@exeter.ac.uk> recently circulated on the Corpora List
an outline of a number of multilingual text projects. The INTERSECT project
at Brighton, which I am involved with, was included in that outline.

We thought it would be useful for the corpus community to have some more
detailed information about these projects. We have therefore prepared a
questionnaire which we are sending to all the contact addresses we have. We d
be very grateful if you could take the trouble to answer the questions as
fully as possible, and to return the questionnaire to us, either by email or
in printed form.

If we get a lot of responses we plan to put the information on an FTP server.

Thanking you in advance for your cooperation.

Alison Dickens,
The Language Centre,
University of Brighton
Falmer, Brighton, BN1 9PH
England.

Phone: (+44) 01273 600900 (Switchboard); 643302 (direct line); 643337 (office)
FAX: (+44) 01273 690710
Email: AMD2@BRIGHTON.AC.UK

***** Parallel Corpus Questionnaire *****

1. Name of project or Corpus

2. Institution(s) where project is based

3. Contact person

4. Sources of funding

5. Description of corpus

* size

* sources of text

* types of text

* plain text or annotations (tagging, SGML, etc)

* Main purposes for which corpus is used/intended

6. Availability of corpora:

* restricted to the research team

* available for purchase

* access to corpus available on-line

* free on request

* available by FTP

* Other (please specify)

7. Alignment

* not aligned at all

* aligned by large unit (please specify)

* aligned by paragraph

* aligned by sentence

* Other (please specify)

8. Method of Alignment

* wholly by hand (please indicate any special software, techniques, etc)

* partial use of alignment software (please specify which)

* total use of alignment software (please specify which)

9. Hardware used to store the corpus

* Type of computer

* Operating system (unix, vms, dos, etc)

* One computer or more than one or a network?

10. Software used to access the corpus

* Monolingual Concordancing software (please specify)

* Bilingual / multilingual Concordancing software (please specify)

* Database software (please specify)

* Standard utilities (grep, etc) (please specify)

* Other (please specify)

11. Source and availability of access software

* Specially developed within the project

* Commercial software (state source and price)

* Shareware (state source)

* Other (please specify)

12. Publications emanating from the project

13. Future plans, dreams, proposals, etc.

14. Any other information

*** Thank you for your help ***