[sword-devel] Valid vs Best Practice XML

Chris Little chrislit at crosswire.org
Fri Sep 14 17:15:28 MST 2012



On 09/14/2012 01:02 PM, Greg Hellings wrote:
 > So I've been debugging a module display problem in BibleTime. I
 > mentioned it on IRC with Troy the other day but we weren't able to
 > connect at the same time to discuss further. The issue has to do with
 > paragraph tags - in osis2mod these tags are being converted from <p>
 > to <div sID="someid" type="paragraph" />.

This is extraordinarily bad. This is a change in semantics, because <p> 
and <div type="paragraph"> are not semantically equivalent.

<p> marks the type of paragraph we all probably think of first: 
generally, a chunk of text with newlines before and after.

<div type="paragraph"> marks a formal division within a text that 
happens to be identified as a 'paragraph' and may consist of multiple 
<p>-type paragraphs. Examples of these divisions are found in many laws 
and the Catechism of the Catholic Church (which does exist in OSIS 
form). Here's part 1, section 1, chapter 1, article 1, paragraph 1 of 
the CCC: http://www.vatican.va/archive/ENG0015/__P16.HTM. As you can 
see, it consists of many <p>-type paragraphs but is a single <div 
type="paragraph">-type paragraph.

Abhorrent though I consider milestoned <p/>, I think I would much prefer 
to see us map <p>...</p> to <p sID=""/>...<p eID=""/> than see us 
clobber the semantics of a defined <div> type.

 > Thus, osis2mod is in violation of the suggested XML best practice by
 > creating a non-EMPTY tag as self-closing but this is seemingly pretty
 > common in the OSIS world. Furthermore our filters are producing
 > invalid (or very strongly discouraged) HTML as per every still-in-use
 > version of the specs (HTML4, XHTML, HTML5). As such, I'm of the
 > opinion that this represents a bug in SWORD - at the very least in the
 > filters that permit empty, self-closing div tags to slip through what
 > are supposedly HTML outputs. Do others agree or disagree on this?

I'm of the opinion that our OSIS is generally fine, meaning we should go 
ahead and keep allowing self-closing OSIS tags if possible (as input and 
output from osis2mod and as content of modules not produced by 
osis2mod). This is just a recommendation and specifically a 
recommendation for the purpose of aiding processing with legacy SGML 
tools, which I can't see us doing and don't personally care about. (The 
semantic violation noted above is a bug in my mind, but that issue is 
orthogonal.)

I would agree that the filter output is buggy if we're generating 
disallowed tag forms. OSIS <div> and <p> would need to be translated to 
their correct, non-self-closing HTML forms. Beyond those two, I can't 
think of any tags that have the same form & general semantics in both 
OSIS & HTML.

--Chris




More information about the sword-devel mailing list