[sword-devel] x-preverse

Fri Feb 24 06:23:50 MST 2006

Troy A. Griffitts wrote:
> OK, to reopen the issue (reluctantly)... :)

I love brainstorming and problem solving! It is precisely this problem 
that is driving a desire for a "direct OSIS" capability in JSword.

>
> The problem:  When importing a document into sword, we need to slice 
> it up into compartments that can be requested by a user. 

<snip/>

> The future plan currently is to place all text preceding a verse that 
> might get displayed before a verse in a more generic: <div 
> type="x-preverse"> so our osis filters can easily put this entire 
> section in the preverse compartment.
I don't think this can be done for all valid and correct OSIS input. 
Take for a made up example:
<chapter osisID="Matt.1">
<p>
<verse osisID="Matt.1.1" sID="Matt.1.1"/>
.........
<verse eID="Matt.1.1"/>
<verse osisID="Matt.1.2" sID="Matt.1.2"/>
.........</p><p>...............
<verse eID="Matt.1.2"/>
<verse osisID="Matt.1.3" sID="Matt.1.3"/>
.........
<verse eID="Matt.1.3"/>
<verse osisID="Matt.1.4" sID="Matt.1.4"/>
.........
<verse eID="Matt.1.4"/>
</p>
................ rest of chapter here ..............
</chapter>

In this case, the paragraph break needs to stand before the verse number.

Basically, the way I look at it is that a Bible should be marked up 
richly without regard to chapter and verse numbering and then these 
numbers are inserted at the point they should appear, probably as 
milestones. The end tags or milestones are added as close to the 
following marker (book, chapter or verse) that still allows for correct 
OSIS. (Not all valid OSIS is correct.)

ATM, this input will fail osis2mod. This is a known reported bug. One 
cannot at this time have verses in paragraphs and I presume other 
containers.

>
>
> These are internal tags to make our processing faster and easier at 
> runtime.  Arguments about their non-OSIS-compliance are moot.
>
> Our "osis to osis" filter is meant to reverse any internal markup we 
> do for osis documents.

I did not know that the OSIS in a sword module shouldn't be held to the 
OSIS schema. (Does mod2osis run through OSIS 2 OSIS? It is pertinent to 
the KJV2006 work.)

However, I don't see this as a good argument for having non-OSIS when it 
could be valid OSIS just as well. Are there other things that the OSIS 
to OSIS filter needs to undo?
(If you could point me to the c++ code, I can look at it and figure it 
out myself.)

> Now, the other argument that Chris has expressed and DM has also 
> lobbied for, is placing the <verse> tag at the point where preverse 
> ends and verse starts...
>
> I can't comment on how JSword strips extra text when preparing for 
> searching, or how verse numbering is customized by the user and 
> processed by JSword.
We index everything that is not a note. For us it is not a question of 
what is or is not canonical. It is a question of what is presented in 
the flow of what the user reads. At this time we don't have the ability 
to turn on/off headers. Don't know which are canonical and which are not 
and until we do I don't think we should have this toggle.

JSword filters all modules into OSIS and then hands it to the client so 
that it can be filtered by xslt.
Since OSIS is well-defined, there should be examples of how to process 
it with xslt (not yet though) and each client can use it to produce the 
look and feel that they need.
If they don't want to know the ins and outs of how the markup is done 
they can use the one we provide.
So far, there is only one GUI that we know of, BibleDesktop.

>   I can only say that SWORD can isolate clients of the api from 
> processing tags when rendering.  The rendering process for all of our 
> frontends is basically:
>
> for (position module at starting verse;
>      as long as I'm <= ending verse;
>      increment module position) {
>   ask module for preverse text and display it
>   show some kind of verse numbering
>   ask module for verse text and display it
> }

Troy, I don't think the clients should have to change.

If the module were true OSIS then one could rely on canonical="true" or 
canonical="false" which can be on every element, but is inherited, to 
determine what is canonical text and thus what should be indexed. In the 
context of verse at a time processing we can't use inheritance. So, in a 
sword OSIS module, I think that every chunk of text that is not 
canonical should have the attribute set to false present on it's 
container (except where it is the default for that element, such as 
note). It should be assumed to be true, inherited from above, otherwise. 
IRRC, all extra-biblical text is held in containers and not between 
milestones.

Some of the OSIS modules I have gotten from CrossWire have verse begin 
markers. There were some postings as to whether this was correct or not. 
In JSword, it led to the appearance of verse numbers twice. So we had to 
put in extra processing to get it to work correctly. It may be that the 
SWORD API frontends have been modified to handle this problem. In the 
latest incarnation of osis2mod.exe in the utils area on the CrossWire 
server, it leaves the begin tag but strips the end tag. This forces the 
use of the milestoned version of verses for the module to work in 
JSword. I don't know if all of these have been fixed. WLC was an example.

That said, I don't see why any front end needs to be changed to have a 
different structure in the module.
If I can guess at the "rest of the story"

Client requests verse.
Sword gets "verse" from module using the index to determine where and 
how much to read. (Let's call this raw text)
Sword then takes it and analyzes it, determining what are notes, strongs 
and morph markup, what are preverse and builds a data structure to 
represent what it finds.
Sword exposes this structure to the Client so that the the above 
algorithm works.

If this is the case, then it does not matter to the sword api's client 
how the verse is marked up. It is up to the sword api to sort thing out 
and hand back to the client what is requested.

> To embed verse numbering inside output from the engine would move tag 
> processing from the filters and place the burden on clients of the 
> engine.

No, it changes the code that figures out what to call preverse text by 
simplifying it.

If the raw text is OSIS and has verse markers then everything standing 
before the verse marker is pre-verse and everything after the verse 
marker until the end marker is marked up verse. And anything standing 
after the end marker is post-verse. (In some non KJV v10n there is 
additional text that is outside the last verse of a chapter and may be 
canonical, such as the closing of an epistle.)

> This would require rewrites for all frontends and I feel the better 
> design is to keep the tag processing modularized and isolated inside 
> our filter mechanism.

As I said, I think that it can be hidden from the front ends exactly as 
it is today.

>
> This is the reasoning for the current implementation and it is not as 
> much of a 'hack' as Chris might think :)  It is a difficult problem to 
> compartmentalize an annotated Biblical text and still provide a 
> concise api to its content.  Not to put words into our good friend Bob 
> at Logos, but I remember him also conveying, in one of our OSIS 
> meetings early on, that they have markers for 'display regions' so 
> they know how much of a document to display when a user asks for, e.g. 
> "Jas.1.1.".  SWORD effectively does the same thing by placing the 
> 'display region' in the verse, but splitting into 2 compartments: 
> verse text, and preverse text.

This idea of a display region needs to be expanded. We've talked about 
this before. When the text has non-trivial markup, such as Psalms poetic 
verse structure, one needs the entire display region to figure out how 
to display a verse requested from it.

There are two regions that would be useful:

One is the display context, such that all of it is necessary to get the 
rendering of a verse correct.

The second is xml well-formed context, such that all of it is necessary 
to get a verse to be well-formed.

Often these would be the same, but as nesting either physical or logical 
(blockquotes represent a nesting that might not be physical) happens 
they might differ.

>
> To be fair, a problematic issue is still Psalm titles.  They are 
> canonical and should be searched when the user does a search of the 
> Biblical text, but they should be displayed before any verse number 
> the application decides to show. 
So one needs to know when preverse is or is not canonical and index it 
when it is.

Not having looked at the code, I think this can be handled fairly 
trivially by extending the getting of preverse to have a flag that says 
return it if it is not canonical. This way apps could turn off getting 
headers by passing a flag to the getting of preverse material, but the 
flag would be ignored if the material is canonical.

When creating the index, both the canonical preverse and the verse text 
could be gotten. In the case of non-OSIS texts, it would work as it does 
today, no preverse is considered canonical.