[sword-devel] Valid vs Best Practice XML

DM Smith dmsmith at crosswire.org
Fri Sep 14 13:15:16 MST 2012


Just to focus the post in what I see at play.

A few issues here:
1) Does OSIS's milestone form of container elements violate XML best practices and does it matter? The sID/eID is a common OSIS construct and, IIRC, in the TEI world.
2) What should SWORD filters do when outputting vertical whitespace. I've noted on other threads that there are other problems? This mostly deals with nesting.
3) How should SWORD filters handle container elements that can cross other container elements, especially when verses are shown in isolation or in table cells? E.g. A paragraph or div that starts in one verse and ends within another.
4) Should osis2mod use this form for a milestoned paragraph, which OSIS does not have.

I think that if the filter output a <br/> for these it would do better.

On Sep 14, 2012, at 4:02 PM, Greg Hellings <greg.hellings at gmail.com> wrote:

> So I've been debugging a module display problem in BibleTime. I
> mentioned it on IRC with Troy the other day but we weren't able to
> connect at the same time to discuss further. The issue has to do with
> paragraph tags - in osis2mod these tags are being converted from <p>
> to <div sID="someid" type="paragraph" />.
> 
> These tags are passing through to BibleTime and are messing with the
> rendering of the module. In the case of this particular module, the
> <p> tags lie outside of the verses so </p> is being converted to <div
> type="paragraph" eID="something" /> on the end of a verse and the <p>
> is being added as a preverse header <div type="paragraph"
> sID="something" />. Now <div type="paragraph" sID="something" /> is
> technically valid XML because the tag is self-closing. However, the
> <div> tags in OSIS are not defined as necessarily empty tags - that
> is, they are able to hold content these ones simply are not doing so.
> As such, the XML spec says that they _should_ not be created as self
> closing (see http://www.w3.org/TR/xml/#d0e2480, the relevant text of
> which reads "Empty-element tags may be used for any element which has
> no content, whether or not it is declared using the keyword EMPTY. For
> interoperability, the empty-element tag should be used, and should
> only be used, for elements which are declared EMPTY.").
> 
> Furthermore, we leave these <div> tags alone in the default HTML and
> XHTML rendering filters. Troy claimed that BibleTime does not use
> SWORD's filters, which is incorrect - our OsisToHtml filter is an
> extensions of sword::OSISHTMLHREF with heavily customized output.
> Both BibleTime and SWORD's filters - at least the HTML filters - leave
> div tags in place. I'm not sure what our target HTML version is, but
> if we're targeting HTML4 then the self-closing tag is strongly advised
> against ("SGML systems conforming to [ISO8879] are expected to
> recognize a number of features that aren't widely supported by HTML
> user agents. We recommend that authors avoid using all of these
> features."). If we are targeting HTML5, then the spec provides for
> optional '/' in void elements (area, base, br, col, command, embed,
> hr, img, input, keygen, link, meta, param, source, track, wbr) where
> the character is purely decoration. It is not valid in any other
> native elements. All of them must close with a distinct close tag. See
> http://dev.w3.org/html5/spec-author-view/syntax.html#syntax-start-tag
> for the appropriate text. In XHTML it is also not permitted as stated
> in these two answers here http://www.w3.org/TR/xhtml-media-types/#C_2
> 
> Thus, osis2mod is in violation of the suggested XML best practice by
> creating a non-EMPTY tag as self-closing but this is seemingly pretty
> common in the OSIS world. Furthermore our filters are producing
> invalid (or very strongly discouraged) HTML as per every still-in-use
> version of the specs (HTML4, XHTML, HTML5). As such, I'm of the
> opinion that this represents a bug in SWORD - at the very least in the
> filters that permit empty, self-closing div tags to slip through what
> are supposedly HTML outputs. Do others agree or disagree on this?
> 
> --Greg
> 
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page




More information about the sword-devel mailing list