I work for a commercial Japanese-English MT system and wanted to know if
somebody could enlighten me on literature or research on the
characteristics of language in patent or trademark specifications
(=texts) such as inventors submit them to national or international
patent offices as the EPO in Munich.
Rather than in the normative "author`s guidelines" provided by patent
offices, which describe the fixed, formalised structure and idioms in
such texts, I am interested in the general linguistic and statistical
aspects of such texts (e.g. almost no use of proper nouns; anaphorical
relations; statistical preference of gerund clauses over relative
clauses with inflected verb, for English e.g. preference of latin-origin
words over germanic-origin words etc.).
Are tagged corpora available somewhere (even within larger bodies, e.g.
of legal texts, and for any source language).
I will post a summary of your replies on this list.
-- Dr. Christoph Neumann email@example.com R&D MT, Nova Inc. Tokyo, Japan http://www.nova.co.jp/english/index.html
This archive was generated by hypermail 2b29 : Tue May 28 2002 - 04:06:31 MET DST