[sword-devel] XML whitespace - significant and insignificant?
David Haslam
dfhdfh at protonmail.com
Sat Feb 9 07:56:32 MST 2019
Thanks DM.
Very helpful.
Any thoughts about XML ws before or after a note element ?
Clearly, a space just after a note should never be inadvertently removed.
And equally so, a space should never be inadvertently inserted just before a note.
I suspect some XML prettifiers may break one or both of these rules.
Best regards
David
Sent from ProtonMail Mobile
On Sat, Feb 9, 2019 at 14:41, DM Smith <dmsmith at crosswire.org> wrote:
> There are several things in play wrt to whitespace in an OSIS document as it pertains to a CrossWire module rendered by SWORD or JSword to a frontend.
> 1) osis2mod’s handling of whitespace.
> 1a) The parser that osis2mod uses to read the OSIS document is not a validating parser. This means that whitespace between elements is always considered important.
> 1b) Newlines are replaced by a space. Note: carriage returns which are part of a Windows style document are not permitted in valid XML. Nor are tabs. If present they are passed as is.
> 1c) Multiple spaces are folded into a single space.
> 1d) Verses are trimmed of leading and trailing space.
> 1e) Verses in the index have a trailing dos newline, even if not present in the input.
>
> 2) Rendering
> 2a) The parser that SWORD uses to render an OSIS module is not a validating parser. This means that whitespace between elements is always considered important.
> 2b) HTML and RTF are different beasts. In HTML elements such as <div>, <p>, <br> produce newlines in the output which are rendered by CSS, perhaps implicit. RTF is precise and controlled by the document.
>
> 3) Pretty print of an OSIS XML document.
> 3a) Nearly all pretty printers will introduce spaces between elements.
> <?xml version="1.0" ?>
> <List name="Fruit List">
> <Item>Apple</Item>
> <Item>Banana</Item>
> <Item>Pear</Item>
> </List>
> This introduces text.
> If the pretty printing put the newlines and spaces within the element it would not have introduced extra content.
> <?xml version="1.0" ?>
> <List name="Fruit List"
>><Item>Apple</Item
>><Item>Banana</Item
>><Item>Pear</Item
>></List>
>
> 3b) Some pretty printers will introduce spaces at the beginning of text.
> <?xml version="1.0" ?>
> <List name="Fruit List">
> <Item>
> Apple
> </Item>
> <Item>
> Banana
> </Item>
> <Item>
> Pear
> </Item>
> </List>
> If the pretty printing put the newlines and spaces within the element it would not have introduced extra content.
> <?xml version="1.0" ?>
> <List name="Fruit List"
>><Item
>>Apple</Item
>><Item
>>Banana</Item
>><Item
>>Pear</Item
>></List>
>
> Best advice for an OSIS module:
> Verse per line.
> Don’t put spaces or new lines after an opening <div>.
>
> In Him,
> DM
>
>> On Feb 8, 2019, at 2:02 PM, David Haslam <dfhdfh at protonmail.com> wrote:
>>
>> Here's a question that I'd like our OSIS experts to ponder.
>>
>> In XML, there's a longstanding topic relating to whitespace.
>>
>> See http://usingxml.com/Basics/XmlSpace
>>
>> When we make a module from an OSIS file, are there any aspects of XML whitespace that can make a significant difference to how the module displays text or features?
>>
>> E.g. Might we inadvertently get a space inserted between a tagged word and a note tag?
>>
>> i.e. As maybe the result of performing a "pretty print" operation on the OSIS source text.
>>
>> cf. I'm sure you can think of other potential areas of interest.
>>
>> AFAIK, this has never been discussed before among us.
>>
>> With various software tools available for making "innocuous" changes to XML files, it's certainly the case that there's nothing to dissuade module providers from using them to "prettify" the OSIS file, even though there might - theoretically at least - be consequences.
>>
>>
>> Best regards,
>>
>> David
>>
>> Sent with ProtonMail Secure Email.
>>
>>
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20190209/1bc3be4f/attachment.html>
More information about the sword-devel
mailing list