[sword-devel] Why is OSIS preferred? Was Re: usfm2osis.pl
Chris Little
chrislit at crosswire.org
Tue Jul 1 06:06:09 MST 2008
>> ** ThML is xml, but is layered upon HTML. It does not separate
>> presentation from content. Cross-references are ad-hoc.
>
> ThML is also still (I think) used by the greatest percentage of our
> modules (though that may be changed in the future).
>
> Separating presentation from content is a nice idea, but I'm not
> convinced that it is good in all cases. What happens with OSIS when a
> Bible publisher wants to insist that certain constructs in their Bible
> are formatted in certain ways?
First, content labeled as ThML is often *not* XML--but ThML from CCEL
probably is validated against their DTD. ThML is based on the Voyager
Strict HTML DTD with a few TEI-inspired elements added, but naturally
hardly anyone ever validates against the DTD.
ThML remains the markup of a large percentage of our content, but that
percentage is declining. New Bibles will always be OSIS (or plain). New
commentaries will always be OSIS. New Dictionaries will probably be TEI
(sometimes OSIS). New GenBooks will preferrably be OSIS or TEI, but
might appear in ThML.
The OSIS TC answer to the question of mandated rendering with particular
markup is: use a stylesheet. The CrossWire answer is to use <hi/> for
styling or put information in type/subType to indicate rendering. But
the issue hasn't ever actually come up.
>> * OSIS is a growing, maturing standard, addressing the short-comings of
>> other popular formats.
>
> And adding some of its own (its complexity comes to mind here, though
> possibly that is intrinsic given what it is trying to cover).
>
> In my view adding milestoning and so forth left the path of strictly
> hierarchical XML. It's still valid XML, but it's not really what XML
> was intended to do. I don't know enough to comment on whether this
> was really necessary or if there is a better way to do it, but it does
> mean that valid OSIS XML may not be valid OSIS (this is true of most
> XML formats, in fact - OSIS just carries it further than most).
Simple things are simple to encode. Complex things are more difficult.
If you look at Bibles encoded in ThML, GBF, or Zefania, it is absolutely
trivial to perform the conversion. You can probably encode an OSIS Bible
from any of these formats using 1:1 element substitution., without any
milestoning.
OSIS' improvement over these formats is in its ability to encode much
more complex Bibles as well. Milestoning is a necessity to encode
multiple, overlapping hierarchies, such as are present in Bibles. What
do you do with a Bible where Rev 12:17 begins in Rev 12 and ends in Rev
13? In OSIS, you encode it as:
<verse osisID="Rev.12.17" sID="Rev.12.17"/>
....
</div>
</p>
</chapter>
<chapter osisID="Rev.13">
<title>Chapter 13</title>
<div type="section">
<p>
....
<verse eID="Rev.12.17"/>
<verse osisID="Rev.13.1" sID="Rev.13.1"/>
In other formats, you have to compromise the text. The cost of complex
textual structure is complex markup.
--Chris
More information about the sword-devel
mailing list