[sword-devel] the future of OSIS support (importer/filters)

Wed Apr 27 05:02:26 MST 2005

Just a couple of comments so most of the thread is stripped. Also, some 
of this is really more a question for OSIS. Chris, hopefully you can 
pass it along, if appropriate.

Chris Little wrote:

> DM Smith wrote:
>
>> I agree that support should be limited to 2.0. Or perhaps 2.1, if it 
>> is pretty near completion. At the OSIS website, you cannot find 
>> documentation for prior versions. This makes it difficult to manage 
>> an earlier version of OSIS. Also, 2.0 is a significant improvement 
>> that it should be enough motivation to cut.
>
> I think 2.1 is pretty stable and it may be a while before any of this 
> particular suggestion really gets implemented, so my meaning is really 
> that we should adopt whatever is current at the time. In any case, for 
> our purposes 2.0 and 2.1 are virtually identical.

I also compared features and it is mostly unchanged. But the differences 
are significant to frontends. On the <hi> element type has been replaced 
by rend as the attribute to hold bold, italic, etc.

I am suggesting that we don't create a need to know the OSIS version 
number for a while. If sword has modules that are encoded according to 
the most recent OSIS then we may have modules that are use every version 
of OSIS. If sword instead says that modules are 2.0 and then when OSIS 
has changed significantly (say 2.7) sword says that 2.7 is now used for 
new modules, this will create an easier upgrade path for frontends.

<snip/>

>>> Verse numbers are not necessarily a single digit and do not 
>>> necessarily flow in numerical order. Encoding <verse> elements 
>>> (along with their n attributes, when present) permits us to render 
>>> lettered verses and range verses easily. It affords us the 
>>> possibility of rendering out-of-order verses (though this will 
>>> require some additional thinking/work). And until multiple 
>>> versifications are actually supported, it allows us to fake them.
>>
>> I am not sure what you are thinking, but I don't think it will work. 
>> The verse (start/length) index will point to the verse as it is in 
>> its order, not by its number. Or it will be massaged to refer to the 
>> verse by its number and not its order. Unless more information is 
>> added to the index (i.e. what the verse actually is, which at this 
>> time is implicit by its offset into the index), this will lead to 
>> inconsistencies. We have discussed these at great length here so I 
>> won't repeat them again.
>
<snip/>

> Until then, however, we store non-canonical verses in the previous 
> canonical verse. If we had verse elements (and chapter too, in the 
> case of Ps.151), we could at least render these more attractively. As 
> it is, they just like a single (big) verse, without verse numbers. 
> Like I said, it's basically faked, since you can't actually reference 
> the individual non-canonical verses (that's part of the v11n work). 
> But rendering a readable well Bible is an improvement over the current 
> situation.

For others, canonical simply means that which is not described by canon.h.

>> So, where do you break a verse? Is everything between verses included 
>> by the following verse? What about material before the first verse in 
>> a chapter/book or work? (i.e. do we actually support introductory 
>> material and if so, how is it delineated?)
>
> Yes, material preceding a verse goes in the verse that follows the 
> material. The exception is the first verse of a chapter. Material 
> preceding the first verse of a chapter goes in the chapter intro. 
> Material preceding a chapter element goes in the book intro.

Should the algorithm look for special "stuff" say <title> that stands 
before the first verse? I don't think that this necessarily belongs in 
an intro.

And I don't understand why introductory material for the minor prophets 
is added to the intro of Isa, but if it stand in front a <chapter> that 
it goes into the book intro. That seems to take it way out of the 
orderly flow. Isn't this akin to a title that stands before an element 
belonging with that element?
<snip/>

> All this is already supported by the API. Introductions have always 
> been part of Sword modules. How frontends support it is not my 
> business, but it would be best if they rendered it properly. :D

It's on our list of things to do :)

>>> We also have the option of normalizing OSIS to a form of our 
>>> choosing. Towards that end, we CAN require that all 
>>> book/chapter/verse tags be milestones.
>>
>> You have already noted that some OSIS container elements are not 
>> milestoneable. For any OSIS work with significant structural markup, 
>> these will result in milestones being used for verses, likely for 
>> chapters and possibly for book (though I am not aware of any instance 
>> of structure crossing a book boundary.)
>
> I don't think anything crosses book boundaries, either, so we /could/ 
> permit container book divs. Likewise, we could probably force chapters 
> to be well-formed XML. There's really only one place (Rev.12-Rev.13) 
> where paragraphs ever cross a chapter division. Arguably, q does at 
> some points (but q will often be milestoned). So we could normalize 
> containers that cross chapters as milestones, if that helps anyone and 
> provided there are no negative consequences anyone can think of.

Using milestones for divs would help verse at a time systems since it is 
designed to be one of the largest containers.

>> From earlier threads on quotes, there are several quote markers that 
>> need to be handled.
>> Block vs inline quotes. (The <q> tag is used for both, but it is not 
>> clear when to render one or the other. These are structural elements, 
>> not simply rendering issues. Does OSIS define a mechanism for this?)
>
> Block quotes need to have type="block" set.

OSIS 2.01 and 2.1 does not document this. 2.01 really only has a 
placeholder for describing the element. 2.1 goes on at great length. 
However, it looks as if they are still thinking it through. Their 
suggestion is to use type of initial|medial|final to indicate whether a 
quote mark is an initial, continuation or final one.

It seems that type should be block|inline and sub-type should be 
initial|medial|final, as this would allow for both inline and 
blockquotes to contain nested and interrupted quotes.

Also, the notion of medial is interesting, this argues for a quote 
element that is neither a begin sID element or a end eID element, but 
something else. Since the sID and eID are paired with the same value, is 
there a need for a mID with the same value?

>> Beginning quote mark, continuing quote mark, end quote mark, nested 
>> begin/continue and end quote marks, and nested with in nested quote 
>> marks. (I consider this to be a structural issue. Notice, there is no 
>> mention of the actual marks that are used.)
>
>
> Nesting can be specified by the level attribute. Which mark is used is 
> supposed to be a style-sheet issue, hence my suggestion that we handle 
> it in .confs. However, there is also the n attribute, where you can 
> put the rendered form of the quotation mark, I believe. (I forget, but 
> we might have also talked about adding a rend attribute to serve this 
> purpose instead.)

The level and the n attributes are not documented in either the 2.01 or 
the 2.1 manual. But I think that using the level attribute to indicate 
the depth of nesting is sufficient. And having a rend attribute hold the 
marker as provided by the publisher is an excellent idea. (I like rend 
better than n as n is used by other xml systems to be a numbering 
scheme, e.g. (pretend example) <br n="3" /> means three line breaks.)

<snip/>

>> Can we include information on the <q> element concerning the kind of 
>> quote mark that is used? (I don't mean the actual mark)
>
> I presume we would define something like level 1, 2, ... n marks that 
> begin & end a quotation and that mark both sides of a break in 
> quotation (according to what a language requires). English, for 
> example would need levels 1 & 2, beginning, continuation beginning, 
> and end--6 marks total (level = level modulo 2). So if you hit a tag 
> that reads <q eID="..." level="2"/>, you know to render a single 9 
> quotation mark.
>
> We could do this on a per-translation basis or a per-language basis 
> and we could allow switching based on locale or user preference.

If n/rend is used to indicate the original marker, then we don't need to 
change the conf for this. Locale files could/should be used to hold the 
quotation system.

<snip/>

>> 2) Both have references to other entries. In the case of Strongs, it 
>> will refer from Strongs Greek to Strongs hebrew as well as internally.
>> When I tackle Naves, I want to be able to create an internal cross 
>> referencing as well as a referencing to verses.
>
> We should probably make the Greek & Hebrew versions a single module. 
> The current modules are based on databases intended for OLB, so they 
> just have numbers for keys (four digit numbers plus a leading 0 in the 
> source for Hebrew words). A better way to do this is with a leading G 
> or H in the key (osisID). That's how Strong's numbers are referenced 
> in OSIS modules, for example.

The G/H is needed since the numbers overlap. Merging them into one 
module is a great idea, but it will require some front ends to change 
(i.e. BibleDesktop) since they vector the reference to a particular module.

Also, it would be good if the transliteration were changed to the 
original script.

> Anyway, your question is really about cross-referencing. The correct 
> way to do that is with the reference element. Internal 
> cross-referencing we can probably handle pretty easily. <reference 
> osisRef="Moses">Moses</reference> would be used to create a reference 
> to the Moses entry in the same document (technically, whatever element 
> has osisID="Moses"). Frontends don't support this (to my knowledge), 
> but that's how it's supposed to be encoded.

In OSIS, what distinguishes an internal reference from a bible verse 
reference?

<snip/>