[jsword-devel] OSIS 2.0 and JSword

DM Smith dmsmith555 at yahoo.com
Wed Aug 18 10:17:17 MST 2004


Looking past a JSword 1.0 release, I was studying the OSIS 2.0 schema 
and it looks like it may be tough to handle well. Specifically, there 
are elements that can be either a marker or a container. With regard to 
Bibles specifically a  verse may start smack dab in the middle of one of 
these other elements. Or one of these elements may end in a verse. And 
it might not be just one element that is split by a verse, it may be 
several.

The upshot is that the index to the start and length of a verse may 
result in a fragment that is not well formed XML.

Maybe I am missing something, but I see only a few solutions:
1) change the meaning of the index so that it results in well-formed 
XML. This well-formed XML will contain the verse or be the verse. 
Software will have to adjust for this.
2) re-encode the encoded bible so that elements are never split by 
verses, but they are transformed into marker elements.
3) do a prepass over the fragment for begin or end tags without the 
corresponding part and artificially add it.
4) Instead of artificially adding it, progressively blurring the passage 
until it is valid.
5) strip them out. (I think that the code does this as part of a 
multi-step recovery mechanism)
6) get the verse out of the book as a whole and then find the nearest 
ancestor element that fully contains the verse. (can't reliably do 
chapters since paragraphs may be split across chapter boundaries.)
7) Sidestep it (for the most part) by presenting a verse in the context 
of the entire chapter. (I say "for the most part" because a paragraph 
can cross chapter boundaries). But, f we have the code to deal with 
chapters, then we will have the code to deal with verses.
 ... any other ...
The first two are out of JSword's control. The others we can do in JSword.

Any thoughts/response?

Assuming that the fragment is well formed XML there are two related 
problems:
What should we do in presenting a fragment that has begin markers for 
non-verse elements but not the corresponding end element?
Likewise for verses that contain the end element but not the 
corresponding begin element?

These don't prevent the verse from being well formed, but they do 
prevent it from being fully meaningful.

I will be talking to a friend who is an expert in Legal XML, where they 
have a similar mechanism to see what their community feels is the best 
practice. He has mentioned that it is a best practice to declare a 
primary container model and everything that can cross those boundaries 
uses marker elements. For example, the primary container model differs 
from one work to another and for a work it may be document, chapter, 
section, sub-section, paragraph with pages and lines being markers.

Does anyone know of a best practice for OSIS, or any other XML field?

Thanks,
    DM




More information about the jsword-devel mailing list