[sword-devel] XML Entities (was Re: New Release) Demo

Tom Sullivan info at beforgiven.info
Fri Aug 30 06:06:32 MST 2019


Y'all:

Attached is a demo of the problem where &#xxxx; encoded Hebrew does not 
show. Both a sample of the problem and UTF-8 encoded Hebrew are shown 
for contrast. The zip has the xml and the resultant module. The format 
is that of a commentary containing only Genesis 1:1.

Tom

Tom Sullivan
info at BeForgiven.INFO
FAX: 815-301-2835


On 8/29/19 1:25 PM, Greg Hellings wrote:
> Let's start a new thread for unrelated replies
> 
> On Thu, Aug 29, 2019 at 10:49 AM Tom Sullivan <info at beforgiven.info 
> <mailto:info at beforgiven.info>> wrote:
> 
>     Y'all:
> 
>     This is a bit late, but I have just found something odd for which I
>     have
>     no explanation.
> 
>     Due to the way some Python modules handle XML, non-ascii characters may
>     be converted to the form &#xxxx; where xxxx is a decimal unicode
>     character. Such characters, such as Hebrew letters, do not appear in
>     diatheke output. They also do not show in Xiphos. They *do* appear in
>     BibleDesktop which uses jsword.
> 
> 
> This tells us that they are surviving the import process intact and 
> aren't being completely stripped or lost by osis2mod.
> 
> Have you tried other programs in the Sword pedigree and outside of 
> JSword? Have you tried BibleTime or Ezra or The SWORD Project for 
> Windows or Bishop? There are several places this could fall down, and if 
> the engine is preserving the content during import, then the falling 
> down could be in the engine, or possibly in the display layer somewhere. 
> More info can help track it down.
> 
> 
>     A Python program to convert all such characters to actual UTF-8 Hebrew
>     solves the problem. (I do use the -N option in osis2mod.)
> 
> 
> This means it's not an UTF-8 issue, but probably an issue with somewhere 
> in the engine->application->display widget pipeline the entity is being 
> stripped out.
> 
> 
>     Related to this, I have noticed for a long time that some other
>     characters such as curly quotes also do not appear in diatheke or
>     Xiphos.
> 
>     Obviously, some characters *cannot* appear in xml because they have
>     syntactical meaning in xml. Thus they must use the &#xxxx; format or
>     other escape method. So how should they be handled so osis2mod works
>     with them.
> 
>     Advice? Comments? Could this be something that could be fixed in Sword?
> 
> 
> If it lives in Sword. More info would be needed to determine that.
> 
> --Greg
> 
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
> ______________________________________________________________________
> 
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: noseeum.zip
Type: application/zip
Size: 3333 bytes
Desc: not available
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20190830/d0951f2f/attachment.zip>


More information about the sword-devel mailing list