Are individual verses in a Sword Module well formed? was Re:
[sword-devel] Verses not in sequential order - front-end problem
Chris Little
chrislit at crosswire.org
Wed Apr 20 04:58:19 MST 2005
DM Smith wrote:
> I looked again at the OSIS website and could not find that verse with
> milestones is the best practice. I think I was able to figure out why it
> would be a necessary practice. It is mentioned that if any OSIS
> container element is used in the milestone form then that element must
> always use the milestone element in the entire work.
I don't find anything either, but trust me that this was the effect of
our decisions. Book/section/paragraph (BSP) is primary. That is the best
practice. Book/chapter/verse (BCV) is secondary and overlays BSP. BCV
doesn't identify linguistically significant or linguistically motivated
segmentation. It is of essentially historical importance and is used
because it is a widely accepted system today, in spite of many known
flaws. BSP is based on linguistically motivated segmentation. It's also
the system that most of the user base from Bible societies & publishing
use. So... that's a little of the reasoning behind why BSP was chosen
over BCV.
You should really avoid milestoning elements in the BSP hierarchy (in
other words, <div> and <p>, though the latter isn't milestoneable).
However, elements that sometimes cross these boundaries include things
like <chapter> and <verse>. So, in effect, you have to use milestones
for <verse> (which crosses <p> boundaries quite frequently). You can
probably get away with using a container <chapter> in many Bibles since
translators/publishers go out of their way to avoid things like
paragraphs that cross chapter boundaries. (However, you might need to
use milestoned <chapter> if you use container <q>.)
> Help me if I am missing something here:
> If a Bible has rich markup, then there will be a need for milestones.
> Lets take <q> and <verse> overlapping as in <q>...<verse>...
> </q>...</verse>
> 1) Milestones are used for <verse> and not for <q>.
> 2) Milestones are used for <q> and not for <verse>.
> 3) Milestones are used for <q> and <verse>.
Actually, you've got me confused below, unless you mixed up 1 and 2. My
confusion is with the above for 2 saying <verse> is not milestoned, but
2 below says it would have to be.
> If 1 is chosen then it will have the most likely side effect of
> requiring most, if not all other containers to be milestoned. This
> means: abbr, closer, div, foreign, l, lg, q, salute, seg, signed, and
> speech. It will be easier to use milestones for all of them unless one
> is certain that verses will never be split by one.
I don't think <q> would ever cross the boundaries of abbr, closer,
foreign, salute, or signed.
> If 2 is chosen then it is likely that only verse and possibly chapter
> will need to be milestoned. So I can see why this may be the best
> practice. Also, the OSIS manual notes that pretty much the only
> practical consequence of a verse element is the rendering of a verse
> number. And of course Sword will use it to mark the start and the length
> of the verse in the module.
>
> 3 is the easiest to adhere to the OSIS rule of consistency in
> milestoning an element in a work.
When I encode, I use milestones for <verse> and <q>. I use them for
<verse> because some other people decided it would be the best practice
and because it simplifies things tremendously to make this
non-linguistic unit cross linguistic unit boundaries. And I use them for
<q> because the primary use of <q> is for rendering quotation marks and
because I consider elements like <l> more improtant to maintain as
containers. But it is really the encoder's choice.
> Of the elements that can contain a verse, at least one, <p>, is not
> milestoneable. So, if a verse ever crosses one of these then using
> milestones for verses is a must. What is not clear from the schema is
> which container elements that can contain verses can hold part of a
> verse. For example, I don't imagine that <cell> or <item> should. <p> is
> specifically mentioned in the OSIS manual as allowing verses to be split.
In theory, there is no reason why a verse boundary could not occur
within a <cell> or <item> element. In practice, I can't think of a time
when it does. Most instances of <cell> and <item> that I have seen in
Bibles occurred in a way that contained the element entirely within a
<verse>.
> With regard to the Sword API, it is possible to get a single verse. If
> the verse has an an element end tag and not its begin or a begin element
> and not its end, i.e. it is not well formed, then an XML parse of that
> verse will fail. OSIS does not require that a verse be well-formed. Does
> Sword in making a module from OSIS ensure that each verse is well formed?
>
> If not, then how should it be handled?
No. There is no guarantee that a verse will contain an end tag matching
every start tag it contains or a start tag matching every end tag it
contains. The importers give you almost exactly what the document contains.
Troy has some practical ideas for how to deal with this.
--Chris
More information about the sword-devel
mailing list