Are individual verses in a Sword Module well formed? was Re: [sword-devel] Verses not in sequential order - front-end problem

Chris Little chrislit at crosswire.org
Wed Apr 20 04:58:19 MST 2005



DM Smith wrote:
> I looked again at the OSIS website and could not find that verse with 
> milestones is the best practice. I think I was able to figure out why it 
> would be a necessary practice. It is mentioned that if any OSIS 
> container element is used in the milestone form then that element must 
> always use the milestone element in the entire work.

I don't find anything either, but trust me that this was the effect of 
our decisions. Book/section/paragraph (BSP) is primary. That is the best 
practice. Book/chapter/verse (BCV) is secondary and overlays BSP. BCV 
doesn't identify linguistically significant or linguistically motivated 
segmentation. It is of essentially historical importance and is used 
because it is a widely accepted system today, in spite of many known 
flaws. BSP is based on linguistically motivated segmentation. It's also 
the system that most of the user base from Bible societies & publishing 
use. So... that's a little of the reasoning behind why BSP was chosen 
over BCV.

You should really avoid milestoning elements in the BSP hierarchy (in 
other words, <div> and <p>, though the latter isn't milestoneable). 
However, elements that sometimes cross these boundaries include things 
like <chapter> and <verse>. So, in effect, you have to use milestones 
for <verse> (which crosses <p> boundaries quite frequently). You can 
probably get away with using a container <chapter> in many Bibles since 
translators/publishers go out of their way to avoid things like 
paragraphs that cross chapter boundaries. (However, you might need to 
use milestoned <chapter> if you use container <q>.)

> Help me if I am missing something here:
> If a Bible has rich markup, then there will be a need for milestones. 
> Lets take <q> and <verse> overlapping as in <q>...<verse>... 
> </q>...</verse>
> 1) Milestones are used for <verse> and not for <q>.
> 2) Milestones are used for <q> and not for <verse>.
> 3) Milestones are used for <q> and <verse>.

Actually, you've got me confused below, unless you mixed up 1 and 2. My 
confusion is with the above for 2 saying <verse> is not milestoned, but 
2 below says it would have to be.

> If 1 is chosen then it will have the most likely side effect of 
> requiring most, if not all other containers to be milestoned. This 
> means: abbr, closer, div, foreign, l, lg, q, salute, seg, signed, and 
> speech. It will be easier to use milestones for all of them unless one 
> is certain that verses will never be split by one.

I don't think <q> would ever cross the boundaries of abbr, closer, 
foreign, salute, or signed.

> If 2 is chosen then it is likely that only verse and possibly chapter 
> will need to be milestoned. So I can see why this may be the best 
> practice. Also, the OSIS manual notes that pretty much the only 
> practical consequence of a verse element is the rendering of a verse 
> number. And of course Sword will use it to mark the start and the length 
> of the verse in the module.
> 
> 3 is the easiest to adhere to the OSIS rule of consistency in 
> milestoning an element in a work.

When I encode, I use milestones for <verse> and <q>. I use them for 
<verse> because some other people decided it would be the best practice 
and because it simplifies things tremendously to make this 
non-linguistic unit cross linguistic unit boundaries. And I use them for 
<q> because the primary use of <q> is for rendering quotation marks and 
because I consider elements like <l> more improtant to maintain as 
containers. But it is really the encoder's choice.

> Of the elements that can contain a verse, at least one, <p>, is not 
> milestoneable. So, if a verse ever crosses one of these then using 
> milestones for verses is a must. What is not clear from the schema is 
> which container elements that can contain verses can hold part of a 
> verse. For example, I don't imagine that <cell> or <item> should. <p> is 
> specifically mentioned in the OSIS manual as allowing verses to be split.

In theory, there is no reason why a verse boundary could not occur 
within a <cell> or <item> element. In practice, I can't think of a time 
when it does. Most instances of <cell> and <item> that I have seen in 
Bibles occurred in a way that contained the element entirely within a 
<verse>.

> With regard to the Sword API, it is possible to get a single verse. If 
> the verse has an an element end tag and not its begin or a begin element 
> and not its end, i.e. it is not well formed, then an XML parse of that 
> verse will fail. OSIS does not require that a verse be well-formed. Does 
> Sword in making a module from OSIS ensure that each verse is well formed?
> 
> If not, then how should it be handled?

No. There is no guarantee that a verse will contain an end tag matching 
every start tag it contains or a start tag matching every end tag it 
contains. The importers give you almost exactly what the document contains.

Troy has some practical ideas for how to deal with this.

--Chris


More information about the sword-devel mailing list