spell-checking corpora

lexicon@kant.irmkant.rm.cnr.it
Wed, 5 Apr 1995 11:03:49 GMT

I am working at the construction of an italian corpus. I have obtained some
things already in electronic format, while for others I have used a
scanner.
I am not sure whether I should read the WHOLE corpus to check for mistakes,
or whether it is enough to use a spell-checker. Does anyone know, even if for
other languages, the percentage in which a scanner, when mistaken, produces
new words from words and not only non-words from words?
Thank you!
Giorgina Brown
e-mail: lexicon@kant.irmkant.rm.cnr.it