[sword-devel] Spelling (was Versification/Encoding issues)

Peter von Kaehne refdoc at gmx.net
Thu Jan 8 17:09:36 MST 2009


Mike Hart wrote:
> That's interesting, because ancle is one of the words I corrected in
> JSFB -- the OCR had ancle, but the PDF itself, my paper KJV copy, and
> my JPS complete Tanach (individual volumes) had ankle...  I can't say
> what verse it was, at the time I was hunting for e's that had been
> OCR'd into c's  (search for 'regular expression'
> [bcdfghjklmnpqrstvwxy]c[bcdfgjklmnpqrstvwx] in kwrite)

You should have a look at Troy's work with tesseract. Rather than search
and replace a text badly ocred he seems to have figured out how to
"educate" tesseract with one or two sample pages until it does the right
thing. That might be way easier and with a better outcome in the long
term for you too.

Peter



More information about the sword-devel mailing list