[jsword-devel] OSIS 2.0 and JSword
DM Smith
dmsmith555 at yahoo.com
Wed Aug 18 10:17:17 MST 2004
Looking past a JSword 1.0 release, I was studying the OSIS 2.0 schema
and it looks like it may be tough to handle well. Specifically, there
are elements that can be either a marker or a container. With regard to
Bibles specifically a verse may start smack dab in the middle of one of
these other elements. Or one of these elements may end in a verse. And
it might not be just one element that is split by a verse, it may be
several.
The upshot is that the index to the start and length of a verse may
result in a fragment that is not well formed XML.
Maybe I am missing something, but I see only a few solutions:
1) change the meaning of the index so that it results in well-formed
XML. This well-formed XML will contain the verse or be the verse.
Software will have to adjust for this.
2) re-encode the encoded bible so that elements are never split by
verses, but they are transformed into marker elements.
3) do a prepass over the fragment for begin or end tags without the
corresponding part and artificially add it.
4) Instead of artificially adding it, progressively blurring the passage
until it is valid.
5) strip them out. (I think that the code does this as part of a
multi-step recovery mechanism)
6) get the verse out of the book as a whole and then find the nearest
ancestor element that fully contains the verse. (can't reliably do
chapters since paragraphs may be split across chapter boundaries.)
7) Sidestep it (for the most part) by presenting a verse in the context
of the entire chapter. (I say "for the most part" because a paragraph
can cross chapter boundaries). But, f we have the code to deal with
chapters, then we will have the code to deal with verses.
... any other ...
The first two are out of JSword's control. The others we can do in JSword.
Any thoughts/response?
Assuming that the fragment is well formed XML there are two related
problems:
What should we do in presenting a fragment that has begin markers for
non-verse elements but not the corresponding end element?
Likewise for verses that contain the end element but not the
corresponding begin element?
These don't prevent the verse from being well formed, but they do
prevent it from being fully meaningful.
I will be talking to a friend who is an expert in Legal XML, where they
have a similar mechanism to see what their community feels is the best
practice. He has mentioned that it is a best practice to declare a
primary container model and everything that can cross those boundaries
uses marker elements. For example, the primary container model differs
from one work to another and for a work it may be document, chapter,
section, sub-section, paragraph with pages and lines being markers.
Does anyone know of a best practice for OSIS, or any other XML field?
Thanks,
DM
More information about the jsword-devel
mailing list