[sword-devel] div type="paragraph" [was: Valid vs Best Practice XML]

Troy A. Griffitts scribe at crosswire.org
Sat Sep 15 15:11:21 MST 2012


Greg,

Thank you for posting the issue.  I'm still really having a tough time 
understanding the problem.  I know we've been crossing on IRC, so I'm 
not sure if you are seeing any of my responses to you there.

We have code to hand these divs and not pass them through, as shown here:

http://crosswire.org/svn/sword/trunk/src/modules/filters/osisxhtml.cpp

search for "paragraph" and it should be like the 2nd or 3rd hit, but 
there is a comment which specifically shows your construct of <div 
eID="" type="paragraph" />

The end result is that this get's output as <!P><br />

If you look below in your ./lookup output, you will see this exact output.

The <!P> was added for/by gnomesword years ago and can be taken out if 
you do a grep through the xiphos code and find it not needed any 
longer.  I'm not sure why it was added.

But, the end result is that we do process this construct and should 
never pass it through.  If Bibletime get's it to passed through, then 
they are not using our filters, either because they are using their own 
filter distinct filter set, or their filter set overrides this 
processing and doesn't accept our default processing.

If you point me to an svn or git or whatever link to the Bibletime 
Render Filter which processes OSIS, I'd be happy to have a look.

Troy


On 09/15/2012 06:56 PM, Greg Hellings wrote:
> To emphasize that we have an issue here, in the SWORD filters, here is
> the output from diatheke with HTML, HTMLHREF and XHTML (which support
> I just hacked in now in order to test).
>
> greg at Gateway08:~/Source/sword/build (master)$ !diath
> diatheke -b TKE -o h -f HTMLHREF -k Gen 1:2
> Genesis 1:2: Elaboya kayawomele naari kayanna dhego. Yaali mahinje
> ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu
> waviravira vadhulu va mahinje, osasanyedhelaga.  <!/P><br />
> (TKE)
> greg at Gateway08:~/Source/sword/build (master)$ diatheke -b TKE -o h -f
> HTML -k Gen 1:2
> <meta http-equiv="content-type" content="text/html;
> charset=UTF-8">Genesis 1:2: Elaboya kayawomele naari kayanna dhego.
> Yaali mahinje ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa
> Mulugu waviravira vadhulu va mahinje, osasanyedhelaga.  <div
> eID="gen11" type="paragraph"/><br />
> (TKE)
> greg at Gateway08:~/Source/sword/build (master)$ diatheke -b TKE -o h -f
> XHTML -k Gen 1:2
> Genesis 1:2: Elaboya kayawomele naari kayanna dhego. Yaali mahinje
> ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu
> waviravira vadhulu va mahinje, osasanyedhelaga.  <div eID="gen11"
> type="paragraph"/>
> (TKE)
>
> All three are outputting the same verse from the same module. HTML and
> XHTML are outputting <div eID="gen11" type="paragraph"/> which is what
> the module has in its rawest form. HTMLHREF outputs <!/P> which is not
> valid anything. There are other, odd, differences between the three
> but none of those are germane to this discussion, it would seem to me.
>
> $ ./examples/cmdline/lookup TKE Gen.1.2
> ==Raw=Entry===============
> Genesis 1:2:
> Elaboya kayawomele naari kayanna dhego. Yaali mahinje ooddiiha ni
> owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu<note n="1">1.2*
> <catchWord>Muneba wa Mulugu</catchWord> naari wi «pevo yuulubale.»
> Mulugu ohukalana muneba mmohi oneethanihu «Muneba Woweela.» Muneba
> Woweela ohukamihedha voopaddusiwa elabo. Mwaana a Mulugu, Yesu
> Kirisitu, teto ohukamihedha moopaddusa (Zhuwawu 1.1-3; aKolose 1.16;
> aHeberi 1.2.)</note> waviravira vadhulu va mahinje, osasanyedhelaga.
> <div eID="gen11" type="paragraph"/>
> ==Render=Entry============
> 		.divineName {			font-variant: small-caps;		}		.wordsOfJesus {color: red;		}	
> Elaboya kayawomele naari kayanna dhego. Yaali mahinje ooddiiha ni
> owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu waviravira vadhulu
> va mahinje, osasanyedhelaga.  <!/P><br />
> ==========================
> Entry Attributes:
>
> [ Footnote ]
> 	[ 1 ]
> 		body = 1.2* <catchWord>Muneba wa Mulugu</catchWord> naari wi «pevo
> yuulubale.» Mulugu ohukalana muneba mmohi oneethanihu «Muneba
> Woweela.» Muneba Woweela ohukamihedha voopaddusiwa elabo. Mwaana a
> Mulugu, Yesu Kirisitu, teto ohukamihedha moopaddusa (Zhuwawu 1.1-3;
> aKolose 1.16; aHeberi 1.2.)
> 		n = 1
>
> On Fri, Sep 14, 2012 at 7:15 PM, Chris Little <chrislit at crosswire.org> wrote:
>>
>> On 09/14/2012 01:02 PM, Greg Hellings wrote:
>>> So I've been debugging a module display problem in BibleTime. I
>>> mentioned it on IRC with Troy the other day but we weren't able to
>>> connect at the same time to discuss further. The issue has to do with
>>> paragraph tags - in osis2mod these tags are being converted from <p>
>>> to <div sID="someid" type="paragraph" />.
>> This is extraordinarily bad. This is a change in semantics, because <p> and
>> <div type="paragraph"> are not semantically equivalent.
>>
>> <p> marks the type of paragraph we all probably think of first: generally, a
>> chunk of text with newlines before and after.
>>
>> <div type="paragraph"> marks a formal division within a text that happens to
>> be identified as a 'paragraph' and may consist of multiple <p>-type
>> paragraphs. Examples of these divisions are found in many laws and the
>> Catechism of the Catholic Church (which does exist in OSIS form). Here's
>> part 1, section 1, chapter 1, article 1, paragraph 1 of the CCC:
>> http://www.vatican.va/archive/ENG0015/__P16.HTM. As you can see, it consists
>> of many <p>-type paragraphs but is a single <div type="paragraph">-type
>> paragraph.
>>
>> Abhorrent though I consider milestoned <p/>, I think I would much prefer to
>> see us map <p>...</p> to <p sID=""/>...<p eID=""/> than see us clobber the
>> semantics of a defined <div> type.
>>
>>
>>> Thus, osis2mod is in violation of the suggested XML best practice by
>>> creating a non-EMPTY tag as self-closing but this is seemingly pretty
>>> common in the OSIS world. Furthermore our filters are producing
>>> invalid (or very strongly discouraged) HTML as per every still-in-use
>>> version of the specs (HTML4, XHTML, HTML5). As such, I'm of the
>>> opinion that this represents a bug in SWORD - at the very least in the
>>> filters that permit empty, self-closing div tags to slip through what
>>> are supposedly HTML outputs. Do others agree or disagree on this?
>> I'm of the opinion that our OSIS is generally fine, meaning we should go
>> ahead and keep allowing self-closing OSIS tags if possible (as input and
>> output from osis2mod and as content of modules not produced by osis2mod).
>> This is just a recommendation and specifically a recommendation for the
>> purpose of aiding processing with legacy SGML tools, which I can't see us
>> doing and don't personally care about. (The semantic violation noted above
>> is a bug in my mind, but that issue is orthogonal.)
>>
>> I would agree that the filter output is buggy if we're generating disallowed
>> tag forms. OSIS <div> and <p> would need to be translated to their correct,
>> non-self-closing HTML forms. Beyond those two, I can't think of any tags
>> that have the same form & general semantics in both OSIS & HTML.
>>
>> --Chris
>>
>>
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page




More information about the sword-devel mailing list