[sword-devel] div type="paragraph" [was: Valid vs Best Practice XML]

Greg Hellings greg.hellings at gmail.com
Sat Sep 15 16:54:26 MST 2012


On Sat, Sep 15, 2012 at 5:11 PM, Troy A. Griffitts <scribe at crosswire.org> wrote:
> Greg,
>
> Thank you for posting the issue.  I'm still really having a tough time
> understanding the problem.  I know we've been crossing on IRC, so I'm not
> sure if you are seeing any of my responses to you there.
>

Anything you say while my Nick is in the channel is saved by ZNC and
bounced to me the next time I login, up until I manually clear the
logs. So yes, I've been getting the messages you've sent.

> We have code to hand these divs and not pass them through, as shown here:
>
> http://crosswire.org/svn/sword/trunk/src/modules/filters/osisxhtml.cpp
>
> search for "paragraph" and it should be like the 2nd or 3rd hit, but there
> is a comment which specifically shows your construct of <div eID=""
> type="paragraph" />
>
> The end result is that this get's output as <!P><br />
>
> If you look below in your ./lookup output, you will see this exact output.

That output is the result of FMT_WEBIF rendering. I'm not sure exactly
what that is, so I can't speak to that.

When I rebuild with HTMLHREF and XHTML I get <!/P>. This makes fine
for HTMLHREF according to what Chris has said elsewhere and you state
below as that is intended for use by GS/Xiphos. That does not make for
acceptable XHTML - it is not valid.

When I rebuild lookup with FMT_HTML I am still seeing the div tag
passed through untouched. That is not valid HTML as discussed earlier
in this thread unless we're hoping to target a very strongly
discouraged construct of an older version of HTML.

Strangely, I can't get the output of Diatheke and lookup to sync up on
the XHTML results.

>
> The <!P> was added for/by gnomesword years ago and can be taken out if you
> do a grep through the xiphos code and find it not needed any longer.  I'm
> not sure why it was added.
>
> But, the end result is that we do process this construct and should never
> pass it through.  If Bibletime get's it to passed through, then they are not
> using our filters, either because they are using their own filter distinct
> filter set, or their filter set overrides this processing and doesn't accept
> our default processing.

The issue in BibleTime has already been taken care of. This only came
to light because the offending <div> tags were in the preverse
material which BibleTime does not pass through any filters but instead
simply strips tags out of the raw text. I can't pretend to know what
that is a good idea, but I'm not interested in that - only in getting
my module looking correct.

I figured I'd point out the discrepancies between SWORD's usages and
the specs in the meantime. To that point, XHTML and HTML are still
generating invalid output according to lookup.

--Greg

>
> If you point me to an svn or git or whatever link to the Bibletime Render
> Filter which processes OSIS, I'd be happy to have a look.
>
> Troy
>
>
> On 09/15/2012 06:56 PM, Greg Hellings wrote:
>>
>> To emphasize that we have an issue here, in the SWORD filters, here is
>> the output from diatheke with HTML, HTMLHREF and XHTML (which support
>> I just hacked in now in order to test).
>>
>> greg at Gateway08:~/Source/sword/build (master)$ !diath
>> diatheke -b TKE -o h -f HTMLHREF -k Gen 1:2
>> Genesis 1:2: Elaboya kayawomele naari kayanna dhego. Yaali mahinje
>> ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu
>> waviravira vadhulu va mahinje, osasanyedhelaga.  <!/P><br />
>> (TKE)
>> greg at Gateway08:~/Source/sword/build (master)$ diatheke -b TKE -o h -f
>> HTML -k Gen 1:2
>> <meta http-equiv="content-type" content="text/html;
>> charset=UTF-8">Genesis 1:2: Elaboya kayawomele naari kayanna dhego.
>> Yaali mahinje ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa
>> Mulugu waviravira vadhulu va mahinje, osasanyedhelaga.  <div
>> eID="gen11" type="paragraph"/><br />
>> (TKE)
>> greg at Gateway08:~/Source/sword/build (master)$ diatheke -b TKE -o h -f
>> XHTML -k Gen 1:2
>> Genesis 1:2: Elaboya kayawomele naari kayanna dhego. Yaali mahinje
>> ooddiiha ni owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu
>> waviravira vadhulu va mahinje, osasanyedhelaga.  <div eID="gen11"
>> type="paragraph"/>
>> (TKE)
>>
>> All three are outputting the same verse from the same module. HTML and
>> XHTML are outputting <div eID="gen11" type="paragraph"/> which is what
>> the module has in its rawest form. HTMLHREF outputs <!/P> which is not
>> valid anything. There are other, odd, differences between the three
>> but none of those are germane to this discussion, it would seem to me.
>>
>> $ ./examples/cmdline/lookup TKE Gen.1.2
>> ==Raw=Entry===============
>> Genesis 1:2:
>> Elaboya kayawomele naari kayanna dhego. Yaali mahinje ooddiiha ni
>> owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu<note n="1">1.2*
>> <catchWord>Muneba wa Mulugu</catchWord> naari wi «pevo yuulubale.»
>> Mulugu ohukalana muneba mmohi oneethanihu «Muneba Woweela.» Muneba
>> Woweela ohukamihedha voopaddusiwa elabo. Mwaana a Mulugu, Yesu
>> Kirisitu, teto ohukamihedha moopaddusa (Zhuwawu 1.1-3; aKolose 1.16;
>> aHeberi 1.2.)</note> waviravira vadhulu va mahinje, osasanyedhelaga.
>> <div eID="gen11" type="paragraph"/>
>> ==Render=Entry============
>>                 .divineName {                   font-variant: small-caps;
>> }               .wordsOfJesus {color: red;              }
>> Elaboya kayawomele naari kayanna dhego. Yaali mahinje ooddiiha ni
>> owoopiha yahuruwedhiwe ni yiihi. Muneba wa Mulugu waviravira vadhulu
>> va mahinje, osasanyedhelaga.  <!/P><br />
>> ==========================
>> Entry Attributes:
>>
>> [ Footnote ]
>>         [ 1 ]
>>                 body = 1.2* <catchWord>Muneba wa Mulugu</catchWord> naari
>> wi «pevo
>> yuulubale.» Mulugu ohukalana muneba mmohi oneethanihu «Muneba
>> Woweela.» Muneba Woweela ohukamihedha voopaddusiwa elabo. Mwaana a
>> Mulugu, Yesu Kirisitu, teto ohukamihedha moopaddusa (Zhuwawu 1.1-3;
>> aKolose 1.16; aHeberi 1.2.)
>>                 n = 1
>>
>> On Fri, Sep 14, 2012 at 7:15 PM, Chris Little <chrislit at crosswire.org>
>> wrote:
>>>
>>>
>>> On 09/14/2012 01:02 PM, Greg Hellings wrote:
>>>>
>>>> So I've been debugging a module display problem in BibleTime. I
>>>> mentioned it on IRC with Troy the other day but we weren't able to
>>>> connect at the same time to discuss further. The issue has to do with
>>>> paragraph tags - in osis2mod these tags are being converted from <p>
>>>> to <div sID="someid" type="paragraph" />.
>>>
>>> This is extraordinarily bad. This is a change in semantics, because <p>
>>> and
>>> <div type="paragraph"> are not semantically equivalent.
>>>
>>> <p> marks the type of paragraph we all probably think of first:
>>> generally, a
>>> chunk of text with newlines before and after.
>>>
>>> <div type="paragraph"> marks a formal division within a text that happens
>>> to
>>> be identified as a 'paragraph' and may consist of multiple <p>-type
>>> paragraphs. Examples of these divisions are found in many laws and the
>>> Catechism of the Catholic Church (which does exist in OSIS form). Here's
>>> part 1, section 1, chapter 1, article 1, paragraph 1 of the CCC:
>>> http://www.vatican.va/archive/ENG0015/__P16.HTM. As you can see, it
>>> consists
>>> of many <p>-type paragraphs but is a single <div type="paragraph">-type
>>> paragraph.
>>>
>>> Abhorrent though I consider milestoned <p/>, I think I would much prefer
>>> to
>>> see us map <p>...</p> to <p sID=""/>...<p eID=""/> than see us clobber
>>> the
>>> semantics of a defined <div> type.
>>>
>>>
>>>> Thus, osis2mod is in violation of the suggested XML best practice by
>>>> creating a non-EMPTY tag as self-closing but this is seemingly pretty
>>>> common in the OSIS world. Furthermore our filters are producing
>>>> invalid (or very strongly discouraged) HTML as per every still-in-use
>>>> version of the specs (HTML4, XHTML, HTML5). As such, I'm of the
>>>> opinion that this represents a bug in SWORD - at the very least in the
>>>> filters that permit empty, self-closing div tags to slip through what
>>>> are supposedly HTML outputs. Do others agree or disagree on this?
>>>
>>> I'm of the opinion that our OSIS is generally fine, meaning we should go
>>> ahead and keep allowing self-closing OSIS tags if possible (as input and
>>> output from osis2mod and as content of modules not produced by osis2mod).
>>> This is just a recommendation and specifically a recommendation for the
>>> purpose of aiding processing with legacy SGML tools, which I can't see us
>>> doing and don't personally care about. (The semantic violation noted
>>> above
>>> is a bug in my mind, but that issue is orthogonal.)
>>>
>>> I would agree that the filter output is buggy if we're generating
>>> disallowed
>>> tag forms. OSIS <div> and <p> would need to be translated to their
>>> correct,
>>> non-self-closing HTML forms. Beyond those two, I can't think of any tags
>>> that have the same form & general semantics in both OSIS & HTML.
>>>
>>> --Chris
>>>
>>>
>>>
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page



More information about the sword-devel mailing list