Corpora: Re: definitions of controlled languages

Pim van der Eijk (pim.vander.eijk@capgemini.nl)
Wed, 07 Apr 1999 09:28:11 +0200

Colleagues and I have used the distinctions "natural" versus "artificial", and
"descriptive" versus "prescriptive" to differentiate sublanguage from controlled
language (see e.g. our CLAW 96 paper).

There is an extensive literature (mostly from the 80s, but still very useful
today) on sublanguages that discuss methods to perform sublanguage analysis. You
can apply these and other (e.g. terminology, corpus studies) methods to a
particular set of documents, e.g. the technical documentation of a particular
company. You may then find (in lexicon analysis) that in a particular domain a
concept is expressed using words A and B, and that word C is used to express two
distinct concepts, which are unambiguously expressed using words D or E.

Separately from this, an analysis of that company's business needs may show that
the existing sublanguage practice needs to be changed, for instance to
accomodate quality requirements of their customers (such as improving
consistency or reducing ambiguity), to improve reusability of documentation
modules, or because the company wants to use an MT system that has limitations
that need to be worked around.

In a controlled language, you want to translate these requirements into explicit
guidelines that authors and editors can take into account. For instance, you may
propose that authors should always use "B" and no longer use "A", or that "D" or
"E" should be used instead of "C". Some of these guidelines can be stated
explicitly and formally, so that they can be machine-checked. A problem with
many controlled language specifications is that although they are defined rather
formally, they are defined in reference to a sublanguage or to general language
which itself is not described formally (or is too vast to describe).

Note that this refers to the use of "controlled language" in technical
documentation applications as a kind of "controlled sublanguages". There is a
(very different) use of controlled language for formal specification, command
and control applications etc. These controlled languages are very different
because there is no existing sublanguage and because they can be very different
syntactically from standard language.