[sword-devel] Setting canonical="true" ?

DM Smith dmsmith at crosswire.org
Thu Mar 1 07:23:40 MST 2012


From a practical perspective, SWORD or JSword only look at canonical on titles to determine whether hiding titles and intros (aka headings) should include them.

Just a few comments about my understanding. Probably a purist;) From the manual (v2.1.1), page 18.
> When canonical="true", it means that the content of that element is a part of the text being encoded.
...
> It should be explicitly noted that the value of the canonical attribute should not be used to reflect theological judgment about the content of a text, but merely to distinguish between what has been added to the text and what has not. 
> 
> In most cases use of the canonical attribute is straightforward, and the default values will almost always produce the intended result. However, there will arise truly difficult cases: for example, one may be encoding an ancient text with annotations of its own. In that case those notes would be canonical, while any added by the current editor would not be. In such cases, the practice chosen and its rationale should be described in the work's documentation.

So, I take this that if I were creating an accurate representation of the 1611 KJV from scans, everything in that "ancient" text would be canonical, including introductions, notes, titles, cross-references, and so forth.

If it is not that way and it is to reflect the underlying publication then I think there is a problem with the usage of the <transChange type="added"> element . In this case these should be marked canonical="false" as they are not part of the "base" text.

I took out the example about notes in a Bible translation. Its intent is that canonical is to distinguish what was in the text the translation was based from what was not in that base.

The confusion is that it is not at all clear what current editor means. There are many who take the KJV, notes and all, make changes to it, say modernizing the spelling, translate it into another language, .... So, since their base is not the Hebrew and Greek, but a particular KJV text, then according to this definition, the imported notes are now canonical.

But as a module encoder, I'd do it the way the OSIS defaults are, with one exception: The <div> element.
> The canonical attribute is available on all elements. 
> 
The following elements without canonical:
osis
osisCorpus
teiHeader
work
workPrefix

> It has a ‘default’ value so it does not have to be entered by the encoder if the default value is acceptable. 
> 

A bit misleading. Only a few (8) element actually have a default. Note, chapter is not there. And having it on osisText is silly (see below).
Default: true <xs:attribute name="canonical" type="xs:boolean" use="optional" default="true"/>
osisText
verse

Default: false <xs:attribute name="canonical" type="xs:boolean" use="optional" default="false"/>
header
div
note
reference
title
titlePage
> The value of this attribute is "inherited," that is once it is set, any subelement of that element inherits the same setting. 
> 

Default: inherited <xs:attribute name="canonical" type="xs:boolean" use="optional"/>
The rest of the elements.

The examples on the same page are confusing, as they don't fit with the XML inheritance mechanism. They have an explicit value on a parent element forcing the inclusion of the attribute on an element with that as a default. Having a default value means that that element never inherits the value.

With inheritance, it should be possible at any point in the document, using an XML parser to ask what the value of canonical is.

However, the attribute "canonical" is not actually inheritable, according to:
http://www.w3.org/TR/2009/WD-xmlschema11-1-20090130/#Inherited_attributes
> 3.3.5.6 Inherited Attributes
> 
> Schema Information Set Contribution: Inherited Attributes
> [Definition:]  An attribute information item A, whether explicitly specified in the input information set or defaulted as described in Attribute Default Value (§3.4.5.1), is potentially inherited by an element information item E if and only if all of the following are true:
> 1 A is among the [attributes] of one of E's ancestors.
> 2 A and E have the same [validation context].
> 3 One of the following is true:
> 3.1 A is ·attributed to· an Attribute Use whose {inheritable} = true.
> 3.2 A is not ·attributed to· any Attribute Use but A has a ·governing attribute declaration· whose {inheritable} = true.
> If and only if an element information item P is not ·skipped· (that is, it is either ·strictly· or ·laxly· assessed), in the ·post-schema-validation infoset· each of P's element information item [children] E which is not ·attributed to· a skip Wildcard, has a property:
> PSVI Contributions for element information items
> [inherited attributes]
> A list of attribute information items. An attribute information item A is included if and only if all of the following are true:
> 1 A is ·potentially inherited· by E.
> 2 Let O be A's [owner element]. A does not have the same expanded name as another attribute which is also ·potentially inherited· by E and whose [owner element] is a descendant of O.
> 
I presume this is a bug in the OSIS Schema.

From a practical perspective in encoding a whole document, there are two scenarios to consider:
1) Milestoning structural elements. (BCV: Book, Chapter and Verse encoding)
2) Milestoning verses. (BSP: Book, Section and Paragraph encoding, recommended)

First the text of the work has to be within (using my notation)
<osis><osisCorpus>(<osisText>(<header>...</header>)*(<titlePage>...</titlePage>)?(<div>CONTENT</div>)+</osisText>)+</osis>
or
<osis>(<osisText><header>...</header>(<titlePage>...</titlePage>)?(<div>CONTENT</div>)+</osisText>)+</osis>
(Note: osis2mod expects only one osisText)

The significant part is the <div>, it cannot be a milestoned form and pass validation. The default value of canonical on this element is "false". Therefore, all descendants not contained in elements whose default is "true" or that explicitly declare canonical="true" inherit the value "false".

Because, divs can be nested, each div resets the state of canonical, either to its default of false or to the declared canonical value.

The fact that <osisText> defaults canonical to true is meaningless. All of its children have a default of false. So practically speaking, the only element with canonical="true" is a verse and its contents that don't have 

The other implication of using the non-milestoned form of <div> is that by OSIS semantic, all other <div>s have to be container elements not milestoned. (I can quote the OSIS 2.1.1 manual, if needed). Personally, I think this is too broad a semantic for <div> and should take into consideration the type attribute.

In case 1), where the document uses the container form for Books (<div type="book">), <chapter> and <verse> and uses as needed or semantically required, the milestoned form of other container, the intention of the OSIS manual is preserved. The defaults work as intended.

However, in case 2), where the verse is milestoned the text and other elements of the verse is not a child of the verse element but rather the container that it is in, typically a paragraph or a div. By the rules of XML (if inheritance were properly specified), the parent container would need to explicitly give or inherit canonical="true".

With regard to SWORD and JSword, they always work on a fragment of the whole document and might not have the parent on which to determine whether canonical is true or false. Practically, they assume true.

If the OSIS schema had the default of canonical on <div> to be true or if it were optional (making the default on osisText meaningful), there would be no issue.

This is to say, I think the OSIS Schema has it wrong for a <div>. Until or unless it is changed, one nearly always has to have canonical="true" on a div.

In Him,
	DM

On Feb 29, 2012, at 2:46 PM, Troy A. Griffitts wrote:

> Sorry to only jump in on problems, but...
> 
> I don't believe the preceding explanation of 'canonical' is correct.
> 
> OSIS defaults many attributes to canonical, including <verse> and <chapter>
> 
> I believe we defined canonical as text belonging to the base work.
> 
> For us, this is mostly Bibles.
> 
> For a study Bible, it would exclude all commentary and notes, and only include Biblical text.
> 
> Basically, canonical for the Open Scripture Information Standard refers to Biblical text, and you'd be hardpressed to use it for anything else practically, though I could see a purist trying to make an argument for it.
> 
> For example, Josephus would only include the text of Josephus.
> 
> And while technically true, the practical uses for 'canonical' are things like:
> 
> Showing Psalm titles even when the user has asked not to show 'titles'
> Searching typically is only over 'canonical' text
> 
> -- but we usually work the opposite way: we take out notes, xrefs, headings, and index what is left, so the Josephus example isn't practically a problem for us right now (plus I think our Josephus module only contains Josephus text).  And this is simply for indexed searching.  Our full text searching allows for your to search any of these other field: notes, xrefs, headings, just about anything in an entry attribute.  We have talked about providing indexed searching for some of these things, but really? how often do you search the notes?  Just wait the 4 seconds to do the unindexed search.  But we have lots of future ideas of how to modularize the search framework so a frontend could supply a filter which outputs what to include in a named lucene index. Anyway, tangent...
> 
> 
> Summary,
> <verse> already indicates canonical material by default
> Psalm titles, being canonical and usually not within a verse (unless it's a v11n which includes them in a verse), need to be marked specifically as canonical.
> 
> If the OSIS docs say different, let me know and I'll poke the editor.
> 
> Troy
> 
> 
> 
> On 02/29/2012 07:11 PM, David Haslam wrote:
>> Thanks DM,
>> 
>> Someone like to volunteer to enhance usfm2osis.pl to ensure that
>> canonical="true" is set as it should be?
>> 
>> David
>> 
>> --
>> View this message in context: http://sword-dev.350566.n4.nabble.com/Setting-canonical-true-tp4432196p4432418.html
>> Sent from the SWORD Dev mailing list archive at Nabble.com.
>> 
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
> 
> 
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20120301/73089a96/attachment-0001.html>


More information about the sword-devel mailing list