[sword-devel] the future of OSIS support (importer/filters)
Chris Little
chrislit at crosswire.org
Tue Apr 26 21:08:33 MST 2005
DM Smith wrote:
> I agree that support should be limited to 2.0. Or perhaps 2.1, if it is
> pretty near completion. At the OSIS website, you cannot find
> documentation for prior versions. This makes it difficult to manage an
> earlier version of OSIS. Also, 2.0 is a significant improvement that it
> should be enough motivation to cut.
I think 2.1 is pretty stable and it may be a while before any of this
particular suggestion really gets implemented, so my meaning is really
that we should adopt whatever is current at the time. In any case, for
our purposes 2.0 and 2.1 are virtually identical.
> With regard to proprietary extensions, I understand they are necessary,
> but I think their use should be very limited and well-documented. Only
> when that happens can proper filters be written.
Proprietary extensions aren't entirely evil. :) Some are forced, by
design, by OSIS (at least in the past), such as x-Strongs was in pre-2.0
(or was it pre-1.5). OSIS tackles a limits set of features for each
release. If we want to do detailed linguistic or manuscript markup, we
couldn't do that with 2.x. Actually, we couldn't do that with any
version of OSIS Core. That said, I do think we /should/ document
proprietary extentions for internal use and also to share with the OSIS
TC for the purpose of improving future versions of OSIS. (That excludes
things like the pre-verse title type which are essentially intended as
aids to rendering within Sword.)
>> Verse numbers are not necessarily a single digit and do not
>> necessarily flow in numerical order. Encoding <verse> elements (along
>> with their n attributes, when present) permits us to render lettered
>> verses and range verses easily. It affords us the possibility of
>> rendering out-of-order verses (though this will require some
>> additional thinking/work). And until multiple versifications are
>> actually supported, it allows us to fake them.
>
> I am not sure what you are thinking, but I don't think it will work. The
> verse (start/length) index will point to the verse as it is in its
> order, not by its number. Or it will be massaged to refer to the verse
> by its number and not its order. Unless more information is added to the
> index (i.e. what the verse actually is, which at this time is implicit
> by its offset into the index), this will lead to inconsistencies. We
> have discussed these at great length here so I won't repeat them again.
The verse element has an n attribute, which is supposed to be used for
verse number rendering. If you have an element like <verse
osisID="Matt.1.1 Matt.1.2" n="1-2">, Sword frontends will currently
render a "1" for the verse number and make no reference to verse "2".
Yet if you lookup either Matt.1.1 or Matt.1.2, you will get that verse.
What should be rendered is "1-2". If we have this element in the data,
we can render the verse number correctly.
Some Bibles mark sub-verses using elements like <verse
osisID="Matt.1.1!a" n="1a">. As it is, we don't represent sub-verse
numbers, but we could render "1a" if we had verse tags included. The
same goes for verses that use non-numeric (or non-Latin numerals) for
numbering. We could correctly number Hebrew manuscript verses with
Hebrew letters; Greek manuscripts could be numbered with Greek letters;
Arabic Bibles could be numbered with Arabic numbers; etc.--if we had the
verse element.
As I said, handling out-of-order issues would take a little more work so
it might better be postponed until v11n is handled better, as you suggest.
Until then, however, we store non-canonical verses in the previous
canonical verse. If we had verse elements (and chapter too, in the case
of Ps.151), we could at least render these more attractively. As it is,
they just like a single (big) verse, without verse numbers. Like I said,
it's basically faked, since you can't actually reference the individual
non-canonical verses (that's part of the v11n work). But rendering a
readable well Bible is an improvement over the current situation.
> So, where do you break a verse? Is everything between verses included by
> the following verse? What about material before the first verse in a
> chapter/book or work? (i.e. do we actually support introductory material
> and if so, how is it delineated?)
Yes, material preceding a verse goes in the verse that follows the
material. The exception is the first verse of a chapter. Material
preceding the first verse of a chapter goes in the chapter intro.
Material preceding a chapter element goes in the book intro.
At the moment, material preceding the book's div element goes in the
book intro also, unless it precedes Gen or Matt (in which case it goes
in the testament intro). An intro to the prophets, for example, would go
in the intro to the first book of the prophets (Isa in the current
static v11n). This is kind of a hack, but it's the best we can do with
the current v11n.
All this is already supported by the API. Introductions have always been
part of Sword modules. How frontends support it is not my business, but
it would be best if they rendered it properly. :D
>> We also have the option of normalizing OSIS to a form of our choosing.
>> Towards that end, we CAN require that all book/chapter/verse tags be
>> milestones.
>
> You have already noted that some OSIS container elements are not
> milestoneable. For any OSIS work with significant structural markup,
> these will result in milestones being used for verses, likely for
> chapters and possibly for book (though I am not aware of any instance of
> structure crossing a book boundary.)
I don't think anything crosses book boundaries, either, so we /could/
permit container book divs. Likewise, we could probably force chapters
to be well-formed XML. There's really only one place (Rev.12-Rev.13)
where paragraphs ever cross a chapter division. Arguably, q does at some
points (but q will often be milestoned). So we could normalize
containers that cross chapters as milestones, if that helps anyone and
provided there are no negative consequences anyone can think of.
> From earlier threads on quotes, there are several quote markers that
> need to be handled.
> Block vs inline quotes. (The <q> tag is used for both, but it is not
> clear when to render one or the other. These are structural elements,
> not simply rendering issues. Does OSIS define a mechanism for this?)
Block quotes need to have type="block" set.
> Beginning quote mark, continuing quote mark, end quote mark, nested
> begin/continue and end quote marks, and nested with in nested quote
> marks. (I consider this to be a structural issue. Notice, there is no
> mention of the actual marks that are used.)
Nesting can be specified by the level attribute. Which mark is used is
supposed to be a style-sheet issue, hence my suggestion that we handle
it in .confs. However, there is also the n attribute, where you can put
the rendered form of the quotation mark, I believe. (I forget, but we
might have also talked about adding a rend attribute to serve this
purpose instead.)
> From a JSword perspective, we work on only the verses that the user
> wishes to see. In the context of a fragment of a larger, complicated
> quote, there will not be enough information carried in the conf to
> determine where we are in the structure of the complex quote to render
> it the same as when the entire context is shown.
> Can we include information on the <q> element concerning the kind of
> quote mark that is used? (I don't mean the actual mark)
I presume we would define something like level 1, 2, ... n marks that
begin & end a quotation and that mark both sides of a break in quotation
(according to what a language requires). English, for example would need
levels 1 & 2, beginning, continuation beginning, and end--6 marks total
(level = level modulo 2). So if you hit a tag that reads <q eID="..."
level="2"/>, you know to render a single 9 quotation mark.
We could do this on a per-translation basis or a per-language basis and
we could allow switching based on locale or user preference.
> While this has been limited to OSIS bibles, I would like to entertain a
> discussion on other works wrt OSIS, for the express purpose of ensuring
> that we don't make decisions that need to be revisited.
>
> Specifically, I am thinking about Nave's and Strongs, both of which have
> (at least) two interesting characteristics in common:
> 1) They have two keys. In the case of Strongs, they have a Strong's
> number and they have the word to which that number refers. Nave's is
> similar in that it has both a code and a word for that code. The basic
> difference between them is that Strong's uses the number for the key and
> displays the word along with the definition and Naves uses the word for
> the key and does not does not display the code. Nave's code is in the
> source as a means of cross-referencing words.
Those codes are not from Nave. They are OLB's indexing mechanism. They
should be replaced by <reference> elements that point to the entry they
represent.
> 2) Both have references to other entries. In the case of Strongs, it
> will refer from Strongs Greek to Strongs hebrew as well as internally.
> When I tackle Naves, I want to be able to create an internal cross
> referencing as well as a referencing to verses.
We should probably make the Greek & Hebrew versions a single module. The
current modules are based on databases intended for OLB, so they just
have numbers for keys (four digit numbers plus a leading 0 in the source
for Hebrew words). A better way to do this is with a leading G or H in
the key (osisID). That's how Strong's numbers are referenced in OSIS
modules, for example.
Anyway, your question is really about cross-referencing. The correct way
to do that is with the reference element. Internal cross-referencing we
can probably handle pretty easily. <reference
osisRef="Moses">Moses</reference> would be used to create a reference to
the Moses entry in the same document (technically, whatever element has
osisID="Moses"). Frontends don't support this (to my knowledge), but
that's how it's supposed to be encoded.
References to OTHER works (modules) is going to be a headache that I
recommend we put off until Sword 3.0. :) It would require matching
osisRefs' workIDs with actual modules that use the same reference
system. It's trivial if we use workID were required to acutally match
the module ID. We could also somehow track OSIS workID/module name
correspondences through a registry. Sounds like a good project for us
all to assign to Troy next time he tries to say Sword has basically all
the features it needs. :D
--Chris
More information about the sword-devel
mailing list