[jsword-devel] Updates

Fri Jul 9 05:59:10 MST 2004

After writing the note, I realized that it is really long. So I thought 
that I would give a list of the key points.
1) KeyList or Passage should be the parameter that is consistently used 
in the API not Key or Verse.
2) Blurring makes sense for all books.
3) The indexing of Bibles is lossy.
4) JSword ignores the Deutrocannonical books when present.

Extending a class does not make the definition recursive. If the class 
is recursive there is no need for an extended class. The classic 
definition of a tree is that a node contains one or more nodes. Thus the 
node class has a methods void add(Node), Node getChild(int i), and so forth.

On further thought of making Key have a limited recursive definition 
does not make sense to me. Here is what went through my head regarding 
Key, KeyList, Passage, ....

The problem with KeyList extending Key is that it is not a true ISA 
relationship. From examining the code for the usage of a Key, a Key 
describes one location in a book and the length of a  Key is from that 
location to the next sequentially larger key. And a KeyList is simply an 
ordered list of possibly non-contiguous Keys. The order is not limited 
to order by sequence, but it usually is used in that way.

Within the code, there are places that need a list of references and a 
couple of places that need individual references.

When I looked at OSIS 2, I noticed that the definition of verse 
boundaries was deliberately fuzzy. A translation may collect one or more 
biblical verses into a single OSIS verse. It may also split a biblical 
verse into two or more OSIS verses ( 1.6a, 1.6b, 1.6c, ...). I guess it 
is a recognition that verses were added after the Bible was written and 
while they have been canonicalized, some "translations" (e.g. the 
Message and other paraphrases) do not adhere to them. And some for 
typographical reasons, refine them further by splitting them.

Since we are ultimately mapping a particular book's markup into OSIS, it 
seems reasonable to consider the OSIS definition.

In looking at the actual definition of Key, it has two parts name and 
parent. In the code, getParent is used in one and only one place and 
that is in ReferencePane which is in "limbo". The notion of parent is 
not actually that of a parent in the sense that it forms a tree. The 
code does set the parent via various constructors, but in each case it 
has a key being a member of a list. But in all cases the list is not a 
member of another list. Thus the only real value to the class Key is 
that it provides a name. I think it would be better to get rid of the 
notion of parent add an interface Nameable or Titleable or something 
that represents that getName returns a title that can be displayed and 
make Key extend it. KeyList would extend Nameable but not Key. KeyList 
would contain Keys.

With regard to your question on blurring, I think that the notion of 
blurring is probably reasonable for all books. Generally a book has some 
kind of divisional boundaries (thus the OSIS div element). And many have 
sub-divisional boundaries (thus the recursive definition of a div 
element). For example a dictionary has definitions, each of which belong 
to a "first letter". The dictionary as a whole is made up of letters 
containing definitions.

The interface of blur(int amount, int restriction) makes sense if it is 
though of as a characteristic of the book and not a characteristic of a 
reference. The reference is a marker in a book and amount is a measure 
of smallest unit of location identity within the book. And the 
restriction should be seen as which divisional boundaries are going to 
be observed. Thus restriction is probably the wrong word to use and 
perhaps something like division, or depth would be better.

If restriction is defined by the markup then it would follow the 
publishers intention of layout. The problem with following the markup is 
that CrossWire provides indexes to the start and length of each of the 
references in a book. It does not provide indexes to divisional markers 
other than testament for Bibles.  The other divisional markers are 
deduced for Bibles.

While I am on the subject of the indexes, they are lossy. This can be 
easily seen in the view source's original tab. There is no markup around 
the verse. Also, if you take the start and length of one verse and add 
them you will find that it may not mark the start of the next verse as 
contained in the index. Between v1start + v1length and v2start is markup 
and/or content. You can see it in the KJV w/ Strong's.

The upshot of this is that it leads to several potential problems 
because each verse has to be handled separately (as is currently done in 
JSword). There is no possibility of grabbing a passage as a whole. If 
the passage is not grabbed as a whole, there is no possibility of a 
pass-through mode for OSIS. The OSIS that is built from getData is not 
the same as the OSIS contained in the book. There is no way to discover 
the true divisional boundaries of a work as the markup may be missing.

In looking at Verse, RocketPassage (BitSet) and BibleInfo, it is clear 
that this is a great optimization for the Old and New Testaments. 
However, at least one of the Bibles has the Deutrocannonical books (aka 
Apocrypha) and these are ignored by JSword. I am not catholic, and I 
have not had much interest in reading them, but shouldn't we include 
them if they are present?

Joe Walker wrote:
> DM Smith wrote:
> 
>> I was wondering whether it would be useful for a Key to have a limited 
>> recursive definition: A key is a reference to one or more parts of a 
>> book. If so then key list would become an internal implementation of key.
> 
> 
> Unless I misunderstand you, it does already!
> 
>     public interface KeyList extends Key
> 
> Joe.
> 
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel
>