Corpora: Language in Patent Texts

From: neumann (
Date: Tue May 28 2002 - 03:58:10 MET DST

  • Next message: Ralf Steinberger: "Corpora: Scientific fellowships at the EC's Joint Research Centre (JRC) in Italy"


    I work for a commercial Japanese-English MT system and wanted to know if
    somebody could enlighten me on literature or research on the
    characteristics of language in patent or trademark specifications
    (=texts) such as inventors submit them to national or international
    patent offices as the EPO in Munich.

    Rather than in the normative "author`s guidelines" provided by patent
    offices, which describe the fixed, formalised structure and idioms in
    such texts, I am interested in the general linguistic and statistical
    aspects of such texts (e.g. almost no use of proper nouns; anaphorical
    relations; statistical preference of gerund clauses over relative
    clauses with inflected verb, for English e.g. preference of latin-origin
    words over germanic-origin words etc.).

    Are tagged corpora available somewhere (even within larger bodies, e.g.
    of legal texts, and for any source language).

    I will post a summary of your replies on this list.

    Thank you!

    Dr. Christoph Neumann
    R&D MT, Nova Inc.
    Tokyo, Japan

    This archive was generated by hypermail 2b29 : Tue May 28 2002 - 04:06:31 MET DST