[sword-devel] XML Entities (was Re: New Release) Demo
Tom Sullivan
info at beforgiven.info
Fri Aug 30 06:06:32 MST 2019
Y'all:
Attached is a demo of the problem where &#xxxx; encoded Hebrew does not
show. Both a sample of the problem and UTF-8 encoded Hebrew are shown
for contrast. The zip has the xml and the resultant module. The format
is that of a commentary containing only Genesis 1:1.
Tom
Tom Sullivan
info at BeForgiven.INFO
FAX: 815-301-2835
On 8/29/19 1:25 PM, Greg Hellings wrote:
> Let's start a new thread for unrelated replies
>
> On Thu, Aug 29, 2019 at 10:49 AM Tom Sullivan <info at beforgiven.info
> <mailto:info at beforgiven.info>> wrote:
>
> Y'all:
>
> This is a bit late, but I have just found something odd for which I
> have
> no explanation.
>
> Due to the way some Python modules handle XML, non-ascii characters may
> be converted to the form &#xxxx; where xxxx is a decimal unicode
> character. Such characters, such as Hebrew letters, do not appear in
> diatheke output. They also do not show in Xiphos. They *do* appear in
> BibleDesktop which uses jsword.
>
>
> This tells us that they are surviving the import process intact and
> aren't being completely stripped or lost by osis2mod.
>
> Have you tried other programs in the Sword pedigree and outside of
> JSword? Have you tried BibleTime or Ezra or The SWORD Project for
> Windows or Bishop? There are several places this could fall down, and if
> the engine is preserving the content during import, then the falling
> down could be in the engine, or possibly in the display layer somewhere.
> More info can help track it down.
>
>
> A Python program to convert all such characters to actual UTF-8 Hebrew
> solves the problem. (I do use the -N option in osis2mod.)
>
>
> This means it's not an UTF-8 issue, but probably an issue with somewhere
> in the engine->application->display widget pipeline the entity is being
> stripped out.
>
>
> Related to this, I have noticed for a long time that some other
> characters such as curly quotes also do not appear in diatheke or
> Xiphos.
>
> Obviously, some characters *cannot* appear in xml because they have
> syntactical meaning in xml. Thus they must use the &#xxxx; format or
> other escape method. So how should they be handled so osis2mod works
> with them.
>
> Advice? Comments? Could this be something that could be fixed in Sword?
>
>
> If it lives in Sword. More info would be needed to determine that.
>
> --Greg
>
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
> ______________________________________________________________________
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: noseeum.zip
Type: application/zip
Size: 3333 bytes
Desc: not available
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20190830/d0951f2f/attachment.zip>
More information about the sword-devel
mailing list