[sword-devel] Old Church Slavonic

David Haslam dfhmch at googlemail.com
Wed May 20 08:19:53 MST 2015


I should have added that the PDF files use embedded fonts that are
non-Unicode custom encoding.

Not only would the text need to be extracted carefully, but the encoding
would need to be reverse engineered before converting to Unicode.

To extract text from PDF files may seem a simple task, but Adobe Reader does
not do this losslessly.
I think you'd need to use full Adobe Acrobat or use non-Adobe software to
achieve accurate extraction.

David





--
View this message in context: http://sword-dev.350566.n4.nabble.com/Old-Church-Slavonic-tp4654741p4654744.html
Sent from the SWORD Dev mailing list archive at Nabble.com.



More information about the sword-devel mailing list