[sword-devel] the future of OSIS support (importer/filters)
DM Smith
dmsmith555 at yahoo.com
Tue Apr 26 16:08:33 MST 2005
Chris Little wrote:
> At the moment, I'm working on a number of new modules. I'm encoding
> them as OSIS documents, but have avoided actually importing them to
> Sword because of the mess that osis2mod currently is. It needs to be
> fixed and I'll probably have to do it myself, but I think we need to
> discuss/debate exactly what our expectations are for what will be
> copied from an OSIS document to a Sword module.
I'd be willing to help. We are almost at a 1.0 release for JSword and
while I hope to work on Nave's and the migration of cp1252 modules to
UTF-8 next, this would be a worthy diversion.
> I have one introductory comment: at present, our OSIS support targets
> a hodgepodge of OSIS 1.1, 1.5, 2.0, and proprietary extensions. This
> is what the OSIS filters target when you specify SourceType=OSIS in
> your .conf file. As an initial recommendation, I would suggest that we
> break away from this and create new, strictly-conformant, OSIS 2.0
> filters, which we would signal with either SourceType=OSIS2.0 or
> (preferably) OSISVersion=2.0. This would, for example, eliminate the
> need to handle deprecated elements/types (like <div type="chapter"> as
> a Bible chapter container). It also permits us to adopt other changes
> to the way we interpret OSIS (which I discuss below). This does NOT
> mean that we necessarily drop proprietary extensions that conform to
> OSIS (e.g. x-types), though proprietary tags would have to be
> translated to appropriate <seg>/<milestone>-type tags.
I agree that support should be limited to 2.0. Or perhaps 2.1, if it is
pretty near completion. At the OSIS website, you cannot find
documentation for prior versions. This makes it difficult to manage an
earlier version of OSIS. Also, 2.0 is a significant improvement that it
should be enough motivation to cut.
With regard to proprietary extensions, I understand they are necessary,
but I think their use should be very limited and well-documented. Only
when that happens can proper filters be written.
> I've brought this up before, but it seems like it might be a good time
> to discuss it more fully. Going forth, I would like to encode the full
> (or nearly full) content of an OSIS document within a Sword module,
> when it is imported.
I agree, but I would like to see the transformation be lossless, if at
all possible. Though I don't care if the transformation uses a different
representation in the result of the final round trip, (e.g. verse
milestones in the final transformation, where verse containers were used
originally.
> Towards that end, I would like osis2mod to copy <verse> tags (both
> open and close, whether container or milestone). I would like this to
> be used as a means for indicating what verse number should be rendered
> as well as where it should be rendered.
I have been waiting/wanting for this for a long time. Some modules
already have this, at least in part, e.g. WLC, if I remember correctly.
> Verse numbers are not necessarily a single digit and do not
> necessarily flow in numerical order. Encoding <verse> elements (along
> with their n attributes, when present) permits us to render lettered
> verses and range verses easily. It affords us the possibility of
> rendering out-of-order verses (though this will require some
> additional thinking/work). And until multiple versifications are
> actually supported, it allows us to fake them.
I am not sure what you are thinking, but I don't think it will work. The
verse (start/length) index will point to the verse as it is in its
order, not by its number. Or it will be massaged to refer to the verse
by its number and not its order. Unless more information is added to the
index (i.e. what the verse actually is, which at this time is implicit
by its offset into the index), this will lead to inconsistencies. We
have discussed these at great length here so I won't repeat them again.
Unless we come up with a good design for a v11n index, I don't think we
should monkey around with the existing index.
> Since it will also mark the starting position of a verse, this also
> permits us to know when to render material preceding a verse before
> the verse number itself (including titles, notes, & introductory
> material).
Most excellent.
So, where do you break a verse? Is everything between verses included by
the following verse? What about material before the first verse in a
chapter/book or work? (i.e. do we actually support introductory material
and if so, how is it delineated?)
> I also recommend copying <chapter> and <div> tags (open and close,
> container or milestone) to modules. This also permits access to
> non-numeric chapter numbers (e.g. chapters A-F of Esther, once we
> support them through multiple versification).
Sounds good.
> We also have the option of normalizing OSIS to a form of our choosing.
> Towards that end, we CAN require that all book/chapter/verse tags be
> milestones.
You have already noted that some OSIS container elements are not
milestoneable. For any OSIS work with significant structural markup,
these will result in milestones being used for verses, likely for
chapters and possibly for book (though I am not aware of any instance of
structure crossing a book boundary.)
>From a rendering perspective, BCV elements are event markers. The
start tag/milestone indicates where to put a BCV name/number and what to
render it. The end marker indicates where to put line breaks for things
like verse at a time.
> I know Joachim has some reservations with copying the </div> tag for a
> book since you can't easily tell whether it is the closing tag of a
> book (and thus not rendered by them) or some other </div>. If we
> require all book/chapter/verse tags to be milestones, we can put a
> type on it (e.g. <div type="book" eID="Gen"/>)--this isn't a normal
> thing to do, but I think it's valid (correct me if I'm wrong).
The </div> tag for a book/testament/work is nearly useless. It tells us
nothing without the corresponding begin tag. Using the milestone end
form provides meaningful information and I think in this case, necessary
information.
In JSword, we will need to manage verses that are not well-formed much
better than we do. Having an end tag that matches something 50 chapters
earlier would result in needing to bail early in finding the start tag.
Since <div> is milestoneable, I would suggest that we transform all divs
into that form. Since div is merely a container whose meaning is
ascribed to it by the type attribute and since it will span many verses
(I know it is possible to have a div in a verse, but I cannot think of
any case where that makes sense) it is problematic to a verse at a time
system like JSword.
>
> I also think we should cease support of OSISqToTick. Quotation marks
> should be encoded as <q> elements. There aren't even many modules that
> uses OSISqToTick,
Only one module uses OSISqToTick.
I think that it should be noted that not all quotes have quotation
marks, even in a work.
> and I don't encode new modules with them. We should include some
> style-sheet-type information in the .conf files (with syntax to be
> determined) to indicate how to render <q> tags.
From earlier threads on quotes, there are several quote markers that
need to be handled.
Block vs inline quotes. (The <q> tag is used for both, but it is not
clear when to render one or the other. These are structural elements,
not simply rendering issues. Does OSIS define a mechanism for this?)
Red letter quotes. (I think that OSIS has a well defined mechanism for
this.)
Beginning quote mark, continuing quote mark, end quote mark, nested
begin/continue and end quote marks, and nested with in nested quote
marks. (I consider this to be a structural issue. Notice, there is no
mention of the actual marks that are used.)
From a JSword perspective, we work on only the verses that the user
wishes to see. In the context of a fragment of a larger, complicated
quote, there will not be enough information carried in the conf to
determine where we are in the structure of the complex quote to render
it the same as when the entire context is shown.
Can we include information on the <q> element concerning the kind of
quote mark that is used? (I don't mean the actual mark)
>
> Comments encouraged.
While this has been limited to OSIS bibles, I would like to entertain a
discussion on other works wrt OSIS, for the express purpose of ensuring
that we don't make decisions that need to be revisited.
Specifically, I am thinking about Nave's and Strongs, both of which have
(at least) two interesting characteristics in common:
1) They have two keys. In the case of Strongs, they have a Strong's
number and they have the word to which that number refers. Nave's is
similar in that it has both a code and a word for that code. The basic
difference between them is that Strong's uses the number for the key and
displays the word along with the definition and Naves uses the word for
the key and does not does not display the code. Nave's code is in the
source as a means of cross-referencing words.
2) Both have references to other entries. In the case of Strongs, it
will refer from Strongs Greek to Strongs hebrew as well as internally.
When I tackle Naves, I want to be able to create an internal cross
referencing as well as a referencing to verses.
More information about the sword-devel
mailing list