[sword-devel] XML Entities (was Re: New Release)
greg.hellings at gmail.com
Thu Aug 29 10:25:05 MST 2019
Let's start a new thread for unrelated replies
On Thu, Aug 29, 2019 at 10:49 AM Tom Sullivan <info at beforgiven.info> wrote:
> This is a bit late, but I have just found something odd for which I have
> no explanation.
> Due to the way some Python modules handle XML, non-ascii characters may
> be converted to the form &#xxxx; where xxxx is a decimal unicode
> character. Such characters, such as Hebrew letters, do not appear in
> diatheke output. They also do not show in Xiphos. They *do* appear in
> BibleDesktop which uses jsword.
This tells us that they are surviving the import process intact and aren't
being completely stripped or lost by osis2mod.
Have you tried other programs in the Sword pedigree and outside of JSword?
Have you tried BibleTime or Ezra or The SWORD Project for Windows or
Bishop? There are several places this could fall down, and if the engine is
preserving the content during import, then the falling down could be in the
engine, or possibly in the display layer somewhere. More info can help
track it down.
> A Python program to convert all such characters to actual UTF-8 Hebrew
> solves the problem. (I do use the -N option in osis2mod.)
This means it's not an UTF-8 issue, but probably an issue with somewhere in
the engine->application->display widget pipeline the entity is being
> Related to this, I have noticed for a long time that some other
> characters such as curly quotes also do not appear in diatheke or Xiphos.
> Obviously, some characters *cannot* appear in xml because they have
> syntactical meaning in xml. Thus they must use the &#xxxx; format or
> other escape method. So how should they be handled so osis2mod works
> with them.
> Advice? Comments? Could this be something that could be fixed in Sword?
If it lives in Sword. More info would be needed to determine that.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the sword-devel