[sword-devel] Valid vs Best Practice XML

Greg Hellings greg.hellings at gmail.com
Sat Sep 15 09:56:23 MST 2012


To emphasize that we have an issue here, in the SWORD filters, here is
the output from diatheke with HTML, HTMLHREF and XHTML (which support
I just hacked in now in order to test).

greg at Gateway08:~/Source/sword/build (master)$ !diath
diatheke -b TKE -o h -f HTMLHREF -k Gen 1:2
Genesis 1:2: Elaboya kayawomele naari kayanna dhego. Yaali mahinje
ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu
waviravira vadhulu va mahinje, osasanyedhelaga.  <!/P><br />
(TKE)
greg at Gateway08:~/Source/sword/build (master)$ diatheke -b TKE -o h -f
HTML -k Gen 1:2
<meta http-equiv="content-type" content="text/html;
charset=UTF-8">Genesis 1:2: Elaboya kayawomele naari kayanna dhego.
Yaali mahinje ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa
Mulugu waviravira vadhulu va mahinje, osasanyedhelaga.  <div
eID="gen11" type="paragraph"/><br />
(TKE)
greg at Gateway08:~/Source/sword/build (master)$ diatheke -b TKE -o h -f
XHTML -k Gen 1:2
Genesis 1:2: Elaboya kayawomele naari kayanna dhego. Yaali mahinje
ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu
waviravira vadhulu va mahinje, osasanyedhelaga.  <div eID="gen11"
type="paragraph"/>
(TKE)

All three are outputting the same verse from the same module. HTML and
XHTML are outputting <div eID="gen11" type="paragraph"/> which is what
the module has in its rawest form. HTMLHREF outputs <!/P> which is not
valid anything. There are other, odd, differences between the three
but none of those are germane to this discussion, it would seem to me.

$ ./examples/cmdline/lookup TKE Gen.1.2
==Raw=Entry===============
Genesis 1:2:
Elaboya kayawomele naari kayanna dhego. Yaali mahinje ooddiiha ni
owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu<note n="1">1.2*
<catchWord>Muneba wa Mulugu</catchWord> naari wi «pevo yuulubale.»
Mulugu ohukalana muneba mmohi oneethanihu «Muneba Woweela.» Muneba
Woweela ohukamihedha voopaddusiwa elabo. Mwaana a Mulugu, Yesu
Kirisitu, teto ohukamihedha moopaddusa (Zhuwawu 1.1-3; aKolose 1.16;
aHeberi 1.2.)</note> waviravira vadhulu va mahinje, osasanyedhelaga.
<div eID="gen11" type="paragraph"/>
==Render=Entry============
		.divineName {			font-variant: small-caps;		}		.wordsOfJesus {color: red;		}	
Elaboya kayawomele naari kayanna dhego. Yaali mahinje ooddiiha ni
owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu waviravira vadhulu
va mahinje, osasanyedhelaga.  <!/P><br />
==========================
Entry Attributes:

[ Footnote ]
	[ 1 ]
		body = 1.2* <catchWord>Muneba wa Mulugu</catchWord> naari wi «pevo
yuulubale.» Mulugu ohukalana muneba mmohi oneethanihu «Muneba
Woweela.» Muneba Woweela ohukamihedha voopaddusiwa elabo. Mwaana a
Mulugu, Yesu Kirisitu, teto ohukamihedha moopaddusa (Zhuwawu 1.1-3;
aKolose 1.16; aHeberi 1.2.)
		n = 1

On Fri, Sep 14, 2012 at 7:15 PM, Chris Little <chrislit at crosswire.org> wrote:
>
>
> On 09/14/2012 01:02 PM, Greg Hellings wrote:
>> So I've been debugging a module display problem in BibleTime. I
>> mentioned it on IRC with Troy the other day but we weren't able to
>> connect at the same time to discuss further. The issue has to do with
>> paragraph tags - in osis2mod these tags are being converted from <p>
>> to <div sID="someid" type="paragraph" />.
>
> This is extraordinarily bad. This is a change in semantics, because <p> and
> <div type="paragraph"> are not semantically equivalent.
>
> <p> marks the type of paragraph we all probably think of first: generally, a
> chunk of text with newlines before and after.
>
> <div type="paragraph"> marks a formal division within a text that happens to
> be identified as a 'paragraph' and may consist of multiple <p>-type
> paragraphs. Examples of these divisions are found in many laws and the
> Catechism of the Catholic Church (which does exist in OSIS form). Here's
> part 1, section 1, chapter 1, article 1, paragraph 1 of the CCC:
> http://www.vatican.va/archive/ENG0015/__P16.HTM. As you can see, it consists
> of many <p>-type paragraphs but is a single <div type="paragraph">-type
> paragraph.
>
> Abhorrent though I consider milestoned <p/>, I think I would much prefer to
> see us map <p>...</p> to <p sID=""/>...<p eID=""/> than see us clobber the
> semantics of a defined <div> type.
>
>
>> Thus, osis2mod is in violation of the suggested XML best practice by
>> creating a non-EMPTY tag as self-closing but this is seemingly pretty
>> common in the OSIS world. Furthermore our filters are producing
>> invalid (or very strongly discouraged) HTML as per every still-in-use
>> version of the specs (HTML4, XHTML, HTML5). As such, I'm of the
>> opinion that this represents a bug in SWORD - at the very least in the
>> filters that permit empty, self-closing div tags to slip through what
>> are supposedly HTML outputs. Do others agree or disagree on this?
>
> I'm of the opinion that our OSIS is generally fine, meaning we should go
> ahead and keep allowing self-closing OSIS tags if possible (as input and
> output from osis2mod and as content of modules not produced by osis2mod).
> This is just a recommendation and specifically a recommendation for the
> purpose of aiding processing with legacy SGML tools, which I can't see us
> doing and don't personally care about. (The semantic violation noted above
> is a bug in my mind, but that issue is orthogonal.)
>
> I would agree that the filter output is buggy if we're generating disallowed
> tag forms. OSIS <div> and <p> would need to be translated to their correct,
> non-self-closing HTML forms. Beyond those two, I can't think of any tags
> that have the same form & general semantics in both OSIS & HTML.
>
> --Chris
>
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page



More information about the sword-devel mailing list