[sword-devel] XML Entities (was Re: New Release)

Tom Sullivan info at beforgiven.info
Thu Aug 29 12:59:42 MST 2019


Y'all:

I should note that this is an old problem - I had just thought that it 
was a client side issue. Bibletime, Xiphos, and diatheke all respond the 
same. I use only Debain Buster on a computer.

If I type osis2mod, I get for the version:
You are running osis2mod: $Rev: 3431 $
According to my package info: libsword 1.8.1

I think that answers what I can.

Thanks,

Tom

Tom Sullivan
info at BeForgiven.INFO
FAX: 815-301-2835
---------------------
Great News!

On 8/29/19 1:25 PM, Greg Hellings wrote:
> Let's start a new thread for unrelated replies
> 
> On Thu, Aug 29, 2019 at 10:49 AM Tom Sullivan <info at beforgiven.info 
> <mailto:info at beforgiven.info>> wrote:
> 
>     Y'all:
> 
>     This is a bit late, but I have just found something odd for which I
>     have
>     no explanation.
> 
>     Due to the way some Python modules handle XML, non-ascii characters may
>     be converted to the form &#xxxx; where xxxx is a decimal unicode
>     character. Such characters, such as Hebrew letters, do not appear in
>     diatheke output. They also do not show in Xiphos. They *do* appear in
>     BibleDesktop which uses jsword.
> 
> 
> This tells us that they are surviving the import process intact and 
> aren't being completely stripped or lost by osis2mod.
> 
> Have you tried other programs in the Sword pedigree and outside of 
> JSword? Have you tried BibleTime or Ezra or The SWORD Project for 
> Windows or Bishop? There are several places this could fall down, and if 
> the engine is preserving the content during import, then the falling 
> down could be in the engine, or possibly in the display layer somewhere. 
> More info can help track it down.
> 
> 
>     A Python program to convert all such characters to actual UTF-8 Hebrew
>     solves the problem. (I do use the -N option in osis2mod.)
> 
> 
> This means it's not an UTF-8 issue, but probably an issue with somewhere 
> in the engine->application->display widget pipeline the entity is being 
> stripped out.
> 
> 
>     Related to this, I have noticed for a long time that some other
>     characters such as curly quotes also do not appear in diatheke or
>     Xiphos.
> 
>     Obviously, some characters *cannot* appear in xml because they have
>     syntactical meaning in xml. Thus they must use the &#xxxx; format or
>     other escape method. So how should they be handled so osis2mod works
>     with them.
> 
>     Advice? Comments? Could this be something that could be fixed in Sword?
> 
> 
> If it lives in Sword. More info would be needed to determine that.
> 
> --Greg
> 
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
> ______________________________________________________________________
> 
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
> 



More information about the sword-devel mailing list