[sword-devel] x-preverse
DM Smith
dmsmith555 at yahoo.com
Fri Feb 24 06:23:50 MST 2006
Troy A. Griffitts wrote:
> OK, to reopen the issue (reluctantly)... :)
I love brainstorming and problem solving! It is precisely this problem
that is driving a desire for a "direct OSIS" capability in JSword.
>
> The problem: When importing a document into sword, we need to slice
> it up into compartments that can be requested by a user.
<snip/>
> The future plan currently is to place all text preceding a verse that
> might get displayed before a verse in a more generic: <div
> type="x-preverse"> so our osis filters can easily put this entire
> section in the preverse compartment.
I don't think this can be done for all valid and correct OSIS input.
Take for a made up example:
<chapter osisID="Matt.1">
<p>
<verse osisID="Matt.1.1" sID="Matt.1.1"/>
.........
<verse eID="Matt.1.1"/>
<verse osisID="Matt.1.2" sID="Matt.1.2"/>
.........</p><p>...............
<verse eID="Matt.1.2"/>
<verse osisID="Matt.1.3" sID="Matt.1.3"/>
.........
<verse eID="Matt.1.3"/>
<verse osisID="Matt.1.4" sID="Matt.1.4"/>
.........
<verse eID="Matt.1.4"/>
</p>
................ rest of chapter here ..............
</chapter>
In this case, the paragraph break needs to stand before the verse number.
Basically, the way I look at it is that a Bible should be marked up
richly without regard to chapter and verse numbering and then these
numbers are inserted at the point they should appear, probably as
milestones. The end tags or milestones are added as close to the
following marker (book, chapter or verse) that still allows for correct
OSIS. (Not all valid OSIS is correct.)
ATM, this input will fail osis2mod. This is a known reported bug. One
cannot at this time have verses in paragraphs and I presume other
containers.
>
>
> These are internal tags to make our processing faster and easier at
> runtime. Arguments about their non-OSIS-compliance are moot.
>
> Our "osis to osis" filter is meant to reverse any internal markup we
> do for osis documents.
I did not know that the OSIS in a sword module shouldn't be held to the
OSIS schema. (Does mod2osis run through OSIS 2 OSIS? It is pertinent to
the KJV2006 work.)
However, I don't see this as a good argument for having non-OSIS when it
could be valid OSIS just as well. Are there other things that the OSIS
to OSIS filter needs to undo?
(If you could point me to the c++ code, I can look at it and figure it
out myself.)
> Now, the other argument that Chris has expressed and DM has also
> lobbied for, is placing the <verse> tag at the point where preverse
> ends and verse starts...
>
> I can't comment on how JSword strips extra text when preparing for
> searching, or how verse numbering is customized by the user and
> processed by JSword.
We index everything that is not a note. For us it is not a question of
what is or is not canonical. It is a question of what is presented in
the flow of what the user reads. At this time we don't have the ability
to turn on/off headers. Don't know which are canonical and which are not
and until we do I don't think we should have this toggle.
JSword filters all modules into OSIS and then hands it to the client so
that it can be filtered by xslt.
Since OSIS is well-defined, there should be examples of how to process
it with xslt (not yet though) and each client can use it to produce the
look and feel that they need.
If they don't want to know the ins and outs of how the markup is done
they can use the one we provide.
So far, there is only one GUI that we know of, BibleDesktop.
> I can only say that SWORD can isolate clients of the api from
> processing tags when rendering. The rendering process for all of our
> frontends is basically:
>
> for (position module at starting verse;
> as long as I'm <= ending verse;
> increment module position) {
> ask module for preverse text and display it
> show some kind of verse numbering
> ask module for verse text and display it
> }
Troy, I don't think the clients should have to change.
If the module were true OSIS then one could rely on canonical="true" or
canonical="false" which can be on every element, but is inherited, to
determine what is canonical text and thus what should be indexed. In the
context of verse at a time processing we can't use inheritance. So, in a
sword OSIS module, I think that every chunk of text that is not
canonical should have the attribute set to false present on it's
container (except where it is the default for that element, such as
note). It should be assumed to be true, inherited from above, otherwise.
IRRC, all extra-biblical text is held in containers and not between
milestones.
Some of the OSIS modules I have gotten from CrossWire have verse begin
markers. There were some postings as to whether this was correct or not.
In JSword, it led to the appearance of verse numbers twice. So we had to
put in extra processing to get it to work correctly. It may be that the
SWORD API frontends have been modified to handle this problem. In the
latest incarnation of osis2mod.exe in the utils area on the CrossWire
server, it leaves the begin tag but strips the end tag. This forces the
use of the milestoned version of verses for the module to work in
JSword. I don't know if all of these have been fixed. WLC was an example.
That said, I don't see why any front end needs to be changed to have a
different structure in the module.
If I can guess at the "rest of the story"
Client requests verse.
Sword gets "verse" from module using the index to determine where and
how much to read. (Let's call this raw text)
Sword then takes it and analyzes it, determining what are notes, strongs
and morph markup, what are preverse and builds a data structure to
represent what it finds.
Sword exposes this structure to the Client so that the the above
algorithm works.
If this is the case, then it does not matter to the sword api's client
how the verse is marked up. It is up to the sword api to sort thing out
and hand back to the client what is requested.
> To embed verse numbering inside output from the engine would move tag
> processing from the filters and place the burden on clients of the
> engine.
No, it changes the code that figures out what to call preverse text by
simplifying it.
If the raw text is OSIS and has verse markers then everything standing
before the verse marker is pre-verse and everything after the verse
marker until the end marker is marked up verse. And anything standing
after the end marker is post-verse. (In some non KJV v10n there is
additional text that is outside the last verse of a chapter and may be
canonical, such as the closing of an epistle.)
> This would require rewrites for all frontends and I feel the better
> design is to keep the tag processing modularized and isolated inside
> our filter mechanism.
As I said, I think that it can be hidden from the front ends exactly as
it is today.
>
> This is the reasoning for the current implementation and it is not as
> much of a 'hack' as Chris might think :) It is a difficult problem to
> compartmentalize an annotated Biblical text and still provide a
> concise api to its content. Not to put words into our good friend Bob
> at Logos, but I remember him also conveying, in one of our OSIS
> meetings early on, that they have markers for 'display regions' so
> they know how much of a document to display when a user asks for, e.g.
> "Jas.1.1.". SWORD effectively does the same thing by placing the
> 'display region' in the verse, but splitting into 2 compartments:
> verse text, and preverse text.
This idea of a display region needs to be expanded. We've talked about
this before. When the text has non-trivial markup, such as Psalms poetic
verse structure, one needs the entire display region to figure out how
to display a verse requested from it.
There are two regions that would be useful:
One is the display context, such that all of it is necessary to get the
rendering of a verse correct.
The second is xml well-formed context, such that all of it is necessary
to get a verse to be well-formed.
Often these would be the same, but as nesting either physical or logical
(blockquotes represent a nesting that might not be physical) happens
they might differ.
>
> To be fair, a problematic issue is still Psalm titles. They are
> canonical and should be searched when the user does a search of the
> Biblical text, but they should be displayed before any verse number
> the application decides to show.
So one needs to know when preverse is or is not canonical and index it
when it is.
Not having looked at the code, I think this can be handled fairly
trivially by extending the getting of preverse to have a flag that says
return it if it is not canonical. This way apps could turn off getting
headers by passing a flag to the getting of preverse material, but the
flag would be ignored if the material is canonical.
When creating the index, both the canonical preverse and the verse text
could be gotten. In the case of non-OSIS texts, it would work as it does
today, no preverse is considered canonical.
More information about the sword-devel
mailing list