[sword-devel] virtual modules
DM Smith
dmsmith555 at yahoo.com
Sat Jan 21 15:22:54 MST 2006
Chris Little wrote:
>
> Troy made a comment to me when we were in Philadelphia for the last
> OSIS conference about Sword (the library) nearing a feature-complete
> state, where we've pretty much got the capability to do all the basic
> stuff that anyone else is doing. Going forward, most of the work in
> Sword (ignoring new module acquisitions/licensing and frontend work)
> is going to be in the area of doing NEW things like this with our
> existing data.
>
I think that there is more that can be done in the API.
One of the things I am planning to work on in JSword is the ability to
work with OSIS directly. As I studied the various Sword Modules, they
consist of a representation of the text and various indexes for the sake
of performance into that representation. (Yes, a gross simplification!)
The indexes are a must. Performance would be horrible otherwise.
I think they could be created quite quickly using various XML parsing
techniques, e.g. xml pull parser. Rather than creating a custom index,
I'm thinking of creating a lucene index keyed on osisID, storing with
that the start and length of the text in the original document. Also, I
would like to figure out how to represent additional information when
such a fragment is not well formed (for example a verse starts in one
paragraph and ends in another).
Another advantage of such a scheme is that it goes a long way toward
alternate versification. That is, given a user's input it can be
converted fairly easily to osisIDs, and these can be used for lookup.
If I understand correctly, the osisIDs are to form a nesting hierarchy.
If it weren't for the fact that an element with an osisID can start in
one document element and finish outside of it, I think elements with
osisIDs could be represented with begin and end tags and not milestoned.
That is,
<tag osisID="y" sID="y">... <tag eID="y">...<tag osisID="w"
sID="w">...<tag eID="w">
and never
<tag osisID="y" sID="y">...<tag osisID="w" sID="w">... <tag
eID="y">...<tag eID="w">
If it is truly nesting then it may be fairly straightforward to
understand a non-bible.
(I am sure that I will find out more as I go along)
More information about the sword-devel
mailing list