[jsword-devel] Direct OSIS access with JSword

DM Smith dmsmith555 at yahoo.com
Sun Sep 25 17:06:30 MST 2005


Dear Friends,

I am considering adding direct OSIS access with JSword. Any input would 
be appreciated. (This would not hold up releasing 1.0, as it is ready as 
soon as we get the installers done. But, if it is completed before we 
get the installers done, then it would be a part of 1.0 :)

As I look at it the primary challenge is creating a map of the OSIS file 
that could support lookup by key. When a key is found it would return 
the offset and length of the text associated with that key. Since OSIS 
does not require that a single fragment be well formed, it may also be 
useful to also return the offset and length of the smallest unit 
containing the fragment that is well formed.

For a Bible, the lookup key would be a verse reference and return from a 
lookup would be the offset and the length of the text that should be 
returned for that key. It is not simply the text of the verse, but could 
also include headings that stand before a verse. While JSword does not 
handle intros yet, these are represented by Sword as Chapter 0 for a 
book intro, Verse 0, for a chapter intro and I guess, book 0 for a 
testament intro. The challenge of this pre-matter is to determine which 
is an intro and which is a heading. I think that following OSIS2Mod, a 
title that stands immediately before a Book, Chapter or Verse would be a 
heading for that element. Otherwise, it would be an intro.

For a dictionary, the lookup key would be the "word" for the dictionary 
entry. I quote the word, because in the case of Strong's module, it is 
the Strong's number. But this useful number is not the dictionary entry. 
In this case, I think it would be useful to support the notion of two 
keys, one an universal reference value and the other the actual text 
that is used. In the case of a "Daily Devotional" the key is a language 
independent form of a date. The user should be allowed to search by 
localized date. Since OSIS does not define what a dictionary should look 
like but states that TEI dictionary schema will be the basis, I suggest 
that we adopt and publish an expectation of what it should be and also 
what variation of current OSIS will be acceptable. As OSIS matures, we 
can migrate to the new standard.

For a commentary the only useful form today (at least in JSword) is one 
that looks just like a Bible. But instead of verse text the comment for 
that verse is provided. Where a single entry covers multiple verses, the 
osisID would list each verse covered.

The way I am seeing the code is that we would have an import 
functionality which would analyze the input and create one or more index 
files. The dialog for the import would have the user specify the kind of 
work (Bible, Dictionary, Commentary, ...), where to get it (via URL, 
which could be file:/// to local disk, perhaps by browsing the disk). It 
would also request info (BookMetaData) about the module such as name.

The difference between this proposal and OSIS2Mod is that OSIS2Mod in 
addition to creating an index file also creates 2 files which contains 
bible verses. These two files do not contain all of the original input. 
This "direct OSIS access" would not modify the OSIS file.

The cheapest way to do this is to mimic the index structure of a Sword 
module and a build a Sword conf file. The alternative is to create an 
OSISDriver, OSISBookMetaData and the like. I favor the latter but only 
because it feels right (not sure why).

One question remains, do we also consider the cost of handling non-KJV 
versifications. This would be fairly costly as it affects the whole of 
bit indexed verses. Or do we append extra stuff to the preceding 
chapter? Or do we drop it (as OSIS2Mod does today)?

Off the top of my head, to do the alternate versifications, we could 
create a lucene index of OSIS references for a work. When lucene builds 
an index it numbers each entry in the index in the order that it is 
added. These could be used for the bit position in a bit set. So to 
convert a verse reference to an ordinal, one would get the lucene number 
for the OSIS reference. The lucene map would be bi-directional so that 
given a lucene number, we could get the OSIS ID for it. The value in 
this is that the mapping of a key to a position in a bit set is 
abstracted to an index. This would work just as well for any 
identifiable key, e.g. dictionary, commentary, thesaurus, .... If we go 
this way, then we probably should go the way of OSISDriver and the like.

In His Service,
    DM


More information about the jsword-devel mailing list