[sword-devel] MODTOOLS-17 To osis2mod, added conversion of hex and decimal numeric entities to UTF-8, with special handling of <, >, &, ', and ".

Nathan Phillip Brink ohnobinki at ohnopublishing.net
Thu Aug 7 01:27:28 EDT 2025


Hello David,

Prior to DM’s fix, “&x20;” would be interpreted “ ” even though it is 
invalid syntax. Also, “&#x20;” would be treated as an error incorrectly.

After DM’s fix, “&x20;” is properly treated as an error and treated as 
if the ampersand were meant to be a literal ampersand in the text and 
“&#x20;” is properly treated as “ ”. You compare this behavior to that 
of a web browser which uses similar technology and behaves the same by 
checking this jsfiddle: https://jsfiddle.net/binki/mwnLv49f/ .

Sorry, I am just reading the Jira issue and the code—I don’t have a 
build environment set up so I can’t actually test it. But it looks to me 
like the changes DM made do indeed fix something here.

Thanks.


On 8/7/2025 12:24 AM, David Haslam wrote:
> Hi DM,
>
> I’m puzzled.
>
> You seems to have thought there was a bug which actually wasn’t.
>
> Please refer to 
> https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references 
> <https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references?wprov=sfti1#> 
>
>
> The # was not a bug !
>
> Regards
>
> David
>
> Sent from Proton Mail <https://proton.me/mail/home> for iOS
>
>
> On Wed, Aug 6, 2025 at 22:20, DM Smith < dmsmith at crosswire.org 
> <mailto:On Wed, Aug 6, 2025 at 22:20, DM Smith <<a href=>> wrote:
>> I’ve just checked in a change for osis2mod.
>>
>> MODTOOLS-17 To osis2mod, added conversion of hex and decimal numeric 
>> entities to UTF-8, with special handling of <, >, &, ', and ".
>>
>> Also:
>> * Fixed a bug in hex numeric entities which defined &xHHHH; rather 
>> than &#xHHHH;.
>> * Added entity sanity check of maximum length of 32.
>> * Refactored entity handling into handleEntities and comment handling 
>> into handleComments.
>> * Changed t_entitytype and t_commentstate into class enums EntityType 
>> and CommentState.
>> * Added -d 1024 for entity and comment parsing.
>>
>> Note: The coding allows for 0 padding of the numeric entities.
>> Note: The 5 need to be treated specially.
>> & or &#x26; → &
>> < or &#x3C; → <
>> > or &#x3E; → >
>> " or &#x22; → " or "
>> ' or &#x27; → ' or '
>> When converted to these forms, " should be transformed into " 
>> except in attributes using " and likewise ' into ' except in 
>> attributes having ‘.
>>
>> I need to update the wiki to match.
>>
>> In Him,
>> DM Smith
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
> _______________________________________________
> sword-devel mailing list:sword-devel at crosswire.org
> http://crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20250807/4a1551ae/attachment-0001.htm>


More information about the sword-devel mailing list