[sword-devel] other osis2mod import issues (from the Wiki)

DM Smith dmsmith at crosswire.org
Thu Jun 4 16:20:18 MST 2009


On Jun 4, 2009, at 5:36 PM, Chris Little wrote:

> Firstly, DM, let me express my appreciation for your taking on the  
> task of improving osis2mod further. It's a daunting task and I'm  
> glad not to be tackling it myself right now. :)
>
> There are a couple of sections of your Wiki addition from today that  
> concern me. Hopefully they'll be easy to address, but I'm not even  
> sure whether these are implemented or only planned features, having  
> not bothered to read the code.
>
> My first concern is with the conversion of <p>...</p> to <div  
> type="paragraph" sID="genX"/>...<div type="paragraph" eID="genX"/>.  
> My problem with this is the use of "paragraph" here in what is  
> essentially a private-use semantic. A <div type="paragraph"> is  
> already defined by OSIS for use in works that have a structural  
> division called paragraphs (as in law codes and, if I recollect  
> correctly, the Catechism of the Catholic Church). I think osis2mod  
> should instead translate <p>...</p> to something with an x- type,  
> e.g. <div type="x-p" sID="genX"/>...<div type="x-p" eID="genX"/>.

I guess it is a private use semantic. Given that osis2mod also handles  
commentaries, I think it is important to have both allowed. I'll make  
the change to osis2mod, and will give you a patch for the filters.

The problem I had with the prior implementation (which I did), was  
that it used the <lb> element. The <lb/> element is roughly the  
equivalent of <br/> and one of the problems in using it was that it  
did not have precisely the same semantic as a <p> or a </p>. Most web  
browsers, which most SWORD front-ends use for display, will show  
subtle differences between the two. For example,
<div>
   <p>
will typically result in a single new line.
<div>
   <br/>
will typically result in 2 new lines.

The advantage to changing it to <div> is that it minimizes the problem  
by allowing for block element semantics.

>
> My other concern may be less easily addressed. The Wiki now states  
> that, when using <title>, the attributes type="book" and  
> type="chapter" are required for book and chapter titles  
> respectively. I have no problem with osis2mod adding these type  
> attributes, but they can't be required if we intend to maintain the  
> policy that osis2mod should accept any valid & best practice  
> conformant document. I don't know whether this is actually written  
> down anywhere, but somewhere between the OSIS 1.0 and 2.0, the  
> committee decided that <title> types would be inherited from their  
> parent element. So a <title> whose parent is <chapter> implicitly  
> has type="chapter" and one whose parent is <div type="book">  
> implicitly has type="book".

The manual needs some help then. While it might be in there, I see no  
mention of type of a title being implied by it's parent.

The should be reworded. I was simply wrong about the reference to what  
goes into the book introduction.  Everything between the opening of a  
book and the first chapter is put into the book introduction.

The problem comes with material between the start of a chapter and the  
first verse of the chapter. The material might be a chapter intro, a  
verse intro or both. The trouble is deciding what the boundary between  
the two should be.

Here is the comment from the code:
                 // Have we found the start of pre-verse material?
                 // Pre-verse material follows the following rules
                 // 1) Between the opening of a book and the first  
chapter, all the material is handled as an introduction to the book.
                 // 2) Between the opening of a chapter and the first  
verse, the material is split between the introduction of the chapter
                 //    and the first verse of the chapter.
                 //    A <div> with a type other than section will be  
taken as a chapter introduction.
                 //    A <title> of type acrostic, psalm or no type,  
will be taken as a title for the verse.
                 //    A <title> of type main or chapter will be seen  
as a chapter title.
                 // 3) Between verses, the material is split between  
the prior verse and the next verse.
                 //    Basically, while end and empty tags are found,  
they belong to the prior verse.
                 //    Once a begin tag is found, it belongs to the  
next verse.
                 // If the title has an attribute type of "main" or  
"chapter"
                 // it belongs to its <div> or <chapter> and is  
treated as part of its heading
                 // Otherwise if it a title in a chapter before the  
first the first verse it
                 // is put into the verse as a preverse title.

If in this location there is a div that has no type or has a type  
other than section, it is seen as part of the introduction. This would  
be something like:
<chapter>
    <div> introductory material </div>
    ...
    <verse n="1">...</verse>

The code assumes that the begin div element without type="section"  
goes into the chapter introduction. It does not assume that the div  
finishes before the first verse. This is important to note.

If a title is seen without an attribute or an attribute other than  
main or chapter, it is understood to be a title for the first verse.  
Otherwise it is for the chapter.

Once a transition is seen, then the division between chapter  
introduction and pre-verse material is set.

The code is still not very smart. It can be improved. For example, it  
does not know parent/child or sibling relationships. If this were to  
be added then we could say that a title immediately followed a chapter  
or was in a non-section div within the chapter introduction.

Anyway, as a real example, consider the following:
<chapter osisID="Ps.119">
<title>Chapter 119</title>
<title type="acrostic">...</title>
<verse osisID="Ps.119.1">verse text</verse>

Where should this be split? In getting feedback for the KJV most felt  
that the second title should be attached to the first verse.

Likewise for Psalm 3, which use type="psalm".

So what should the rule be?

The simplest change would be that a title without a type attribute or  
one of main or chapter is seen as a chapter title.

In Him,
	DM





More information about the sword-devel mailing list