[jsword-devel] Direct OSIS access with JSword
DM Smith
dmsmith555 at yahoo.com
Sun Sep 25 17:06:30 MST 2005
Dear Friends,
I am considering adding direct OSIS access with JSword. Any input would
be appreciated. (This would not hold up releasing 1.0, as it is ready as
soon as we get the installers done. But, if it is completed before we
get the installers done, then it would be a part of 1.0 :)
As I look at it the primary challenge is creating a map of the OSIS file
that could support lookup by key. When a key is found it would return
the offset and length of the text associated with that key. Since OSIS
does not require that a single fragment be well formed, it may also be
useful to also return the offset and length of the smallest unit
containing the fragment that is well formed.
For a Bible, the lookup key would be a verse reference and return from a
lookup would be the offset and the length of the text that should be
returned for that key. It is not simply the text of the verse, but could
also include headings that stand before a verse. While JSword does not
handle intros yet, these are represented by Sword as Chapter 0 for a
book intro, Verse 0, for a chapter intro and I guess, book 0 for a
testament intro. The challenge of this pre-matter is to determine which
is an intro and which is a heading. I think that following OSIS2Mod, a
title that stands immediately before a Book, Chapter or Verse would be a
heading for that element. Otherwise, it would be an intro.
For a dictionary, the lookup key would be the "word" for the dictionary
entry. I quote the word, because in the case of Strong's module, it is
the Strong's number. But this useful number is not the dictionary entry.
In this case, I think it would be useful to support the notion of two
keys, one an universal reference value and the other the actual text
that is used. In the case of a "Daily Devotional" the key is a language
independent form of a date. The user should be allowed to search by
localized date. Since OSIS does not define what a dictionary should look
like but states that TEI dictionary schema will be the basis, I suggest
that we adopt and publish an expectation of what it should be and also
what variation of current OSIS will be acceptable. As OSIS matures, we
can migrate to the new standard.
For a commentary the only useful form today (at least in JSword) is one
that looks just like a Bible. But instead of verse text the comment for
that verse is provided. Where a single entry covers multiple verses, the
osisID would list each verse covered.
The way I am seeing the code is that we would have an import
functionality which would analyze the input and create one or more index
files. The dialog for the import would have the user specify the kind of
work (Bible, Dictionary, Commentary, ...), where to get it (via URL,
which could be file:/// to local disk, perhaps by browsing the disk). It
would also request info (BookMetaData) about the module such as name.
The difference between this proposal and OSIS2Mod is that OSIS2Mod in
addition to creating an index file also creates 2 files which contains
bible verses. These two files do not contain all of the original input.
This "direct OSIS access" would not modify the OSIS file.
The cheapest way to do this is to mimic the index structure of a Sword
module and a build a Sword conf file. The alternative is to create an
OSISDriver, OSISBookMetaData and the like. I favor the latter but only
because it feels right (not sure why).
One question remains, do we also consider the cost of handling non-KJV
versifications. This would be fairly costly as it affects the whole of
bit indexed verses. Or do we append extra stuff to the preceding
chapter? Or do we drop it (as OSIS2Mod does today)?
Off the top of my head, to do the alternate versifications, we could
create a lucene index of OSIS references for a work. When lucene builds
an index it numbers each entry in the index in the order that it is
added. These could be used for the bit position in a bit set. So to
convert a verse reference to an ordinal, one would get the lucene number
for the OSIS reference. The lucene map would be bi-directional so that
given a lucene number, we could get the OSIS ID for it. The value in
this is that the mapping of a key to a position in a bit set is
abstracted to an index. This would work just as well for any
identifiable key, e.g. dictionary, commentary, thesaurus, .... If we go
this way, then we probably should go the way of OSISDriver and the like.
In His Service,
DM
More information about the jsword-devel
mailing list