[sword-devel] XML Entities (was Re: New Release)
info at beforgiven.info
Thu Aug 29 12:59:42 MST 2019
I should note that this is an old problem - I had just thought that it
was a client side issue. Bibletime, Xiphos, and diatheke all respond the
same. I use only Debain Buster on a computer.
If I type osis2mod, I get for the version:
You are running osis2mod: $Rev: 3431 $
According to my package info: libsword 1.8.1
I think that answers what I can.
info at BeForgiven.INFO
On 8/29/19 1:25 PM, Greg Hellings wrote:
> Let's start a new thread for unrelated replies
> On Thu, Aug 29, 2019 at 10:49 AM Tom Sullivan <info at beforgiven.info
> <mailto:info at beforgiven.info>> wrote:
> This is a bit late, but I have just found something odd for which I
> no explanation.
> Due to the way some Python modules handle XML, non-ascii characters may
> be converted to the form &#xxxx; where xxxx is a decimal unicode
> character. Such characters, such as Hebrew letters, do not appear in
> diatheke output. They also do not show in Xiphos. They *do* appear in
> BibleDesktop which uses jsword.
> This tells us that they are surviving the import process intact and
> aren't being completely stripped or lost by osis2mod.
> Have you tried other programs in the Sword pedigree and outside of
> JSword? Have you tried BibleTime or Ezra or The SWORD Project for
> Windows or Bishop? There are several places this could fall down, and if
> the engine is preserving the content during import, then the falling
> down could be in the engine, or possibly in the display layer somewhere.
> More info can help track it down.
> A Python program to convert all such characters to actual UTF-8 Hebrew
> solves the problem. (I do use the -N option in osis2mod.)
> This means it's not an UTF-8 issue, but probably an issue with somewhere
> in the engine->application->display widget pipeline the entity is being
> stripped out.
> Related to this, I have noticed for a long time that some other
> characters such as curly quotes also do not appear in diatheke or
> Obviously, some characters *cannot* appear in xml because they have
> syntactical meaning in xml. Thus they must use the &#xxxx; format or
> other escape method. So how should they be handled so osis2mod works
> with them.
> Advice? Comments? Could this be something that could be fixed in Sword?
> If it lives in Sword. More info would be needed to determine that.
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
> sword-devel mailing list: sword-devel at crosswire.org
> Instructions to unsubscribe/change your settings at above page
More information about the sword-devel