[sword-devel] a new source for modules?

Jonathan Morgan jonmmorgan at gmail.com
Sun Aug 30 06:26:26 MST 2009


On Sun, Aug 30, 2009 at 7:41 AM, Chris Little<chrislit at crosswire.org> wrote:
> Peter von Kaehne wrote:
>>
>> Peter von Kaehne wrote:
>>>
>>> Just started to look around Google Books and saw the huge collection of
>>> public domain books scanned, OCRd and transformed into epub books.
>>>
>>
>> E.g. here Wesley's complete works:
>>
>>
>> http://books.google.co.uk/books?id=2tdhAAAAIAAJ&printsec=frontcover&dq=subject:%22+theology+%22&lr=&as_brr=1&ei=opmZSu6qDqCGygTxpIjODg&rview=1#v=onepage&q=&f=false
>>
>> Scanned, OCRed and as epub. A rudimentary genbook should be creatable
>> within a couple of hours and once references etc are inserted it could
>> be a valuable resource beyond the ability of an epub reader.
>>
>> Peter
>
> No doubt this is a step in the right direction, but I have the same
> misgivings as Matthew regarding OCRd and unproofed content.
>
> I popped the book you cite into Adobe Digital Editions to check the quality,
> and found most of the OCR problems we would expect to see:
> weird layout, non-Latin text appears as gibberish, and one (text) page I
> spotted was just presented inline as a scanned image.
>
> So, it's a good step, but the quality is pretty bad.

I agree too.  I am involved with a website that distributes a lot of
scanned and OCR'd works, and when I read some of them I think "How
could you seriously present that document to the world?"  For what
it's worth, Logos say that it is faster and better to type the content
in yourself than to OCR and then proofread and correct, and Logos
produces a lot of content.  I suspect that would be certain for
reasonably complex scripts and layouts, and quite possible even for
reasonably simple content if you have good typists.

Jon



More information about the sword-devel mailing list