Corpora: German OCR and translation software

by way of Loren A. BILLINGS (jeff_allen@juno.com)
Sun, 9 Nov 1997 21:26:36 +0100

Dear colleagues,

I was sent the following request and am unable to answer it. Might one of
you be able to? Please reply directly to Jeff Allen (<jeff_allen@juno.com>
or <jeffa@cs.cmu.edu>). Thank you. --Loren Billings

A friend would like to scan in German theological texts
and then use a PC translation system to translate them into
English. Can you try to find out the following info for me?

1) What is the best OCR (Optical Character Recognition)
accuracy for German and what software is it run with? I have
been using MAC Omnipage and MAC Read-it-OCR for
scanning French Creole text lately. Even after manually
training the software to recognize the printed text, I usually
get 80% recognition on French Creole newspaper text that
has been enlarged 110% from 10 to 12 pt print. Some texts
are 70% recognition and from time to time I am lucky to
have 90% recognition. I normally get 90 - 92% recognition
on French Creole texts that are in a monospace 12 pt print
that is doublespaced. That is for Creole. There are possibly
some OCR packages out there for German, but I do not know
what they are and how good they are. A PC version would
be what my friend is looking for.

2). My friend mentions Globalink Machine Translation below. It's
quite a good package at 140 dollars. I tested the Engl/French
version at [...]. Just wondering if you know anyone
who would have an idea how good commercial German -->
English PC translation packages are.

With only 70-80% character recognition, I doubt that scanning
texts to run through OCR and then through a PC MT system would
be much worth the effort. It anyone in Germany can attest to
OCR being in the 90% range, then it may be worth investigating
it further.

Thanks in advance if you can find out anything on this for me.