[jsword-devel] Efficient Bible Text Storage Formats

Erik Reitsma jsword-devel@crosswire.org
Mon, 12 Jan 2004 00:31:18 +0100 (CET)

Hi Stephen and others,

> (I'm finding it a little hard to concentrate at the moment, as 120km/h
> wind
> gusts are threatening to take the roof off my house, so sorry if I've left
> any sentences unfinished)

I am glad that you survived!
>> Thank you Stephen, for your very interesting ideas. This sounds like a
>> very useful format indeed. Definitely suitable for what I was looking
>> for... Can this format be used freely? (I mean, without patents or so?)

> I don't know if any of these mentioned techniques or any others that I
> have
> used are already patented. As far as I know they are not, but that is not
> a
> guarantee.

At least you are not protecting your ideas though patents. If someone else
has protected their ideas, I understand that is their responsibility to
protect them. I can deal with that when it comes to it. Since your sources
are public, they are probably not patented either.

> As for the algorithms that I have come up with to produce this format, I
> intend for them to be freely usable to produce bibles and bible related
> texts by anyone, whether for profit or for free, and while I reserve the
> right to use the techniques and algorithm I have come up with for other
> kinds of texts (non-bible dictionaries, general book readers, etc). If
> anyone else wishes to make money out of my ideas and algorithms, I'd like
> them to pay me whatever royalties they think are appropriate. I intend
> leaving the enforcement of just treatment, to God.

Excellent. I do not intend to use these algorithms for anything but a
bible reader for PersonalJava.

> I would like to make money from this, but not at the expense of limiting
> the spread of the good news of Christ.
> Publishing code to read the file format as GPL would mean that the code
> could not be re-used by others unless it was also a GPL project.

Or the author could give the license under another license to others too.

> I would
> prefer if the file format reading code was LGPL. It could then still be
> used
> in a GPL project, but could also be used by commercial vendors, who do not
> wish to publish their own source code.

I do not really mind. More GPL bible software would be to my advantage,
since I am not in the bible software business (it is just hobby for me).
But I would be flexible with the license, especially towards the one who
came with the file format :)

> Keep in mind that there are three things we are talking about:
> 1. File format(s) - currently palm database, but easily modifiable to
> other
> less structured file formats.
> 2. Algorithms to produce the file
> 3. Algorithms to read the file
> I think that algorithms to read the file go hand in hand with the file
> format. Sure there are specific tricks to reduce memory or cache indexes,
> etc. But given a file format, an algorithm to read from it can be
> generated with ease.

Agreed, they are closely related.

> The algorithm to produce the file is quite a different matter. The file
> format contains no information about how to determine the string of ending
> letters, how to choose phrasebook entries, etc. Yet it is these techniques
> that provide most of the compression.

I can see that. For me the advantage is that this does not have to run on
a P800 :)

> I haven't looked into what PersonalJava is till today, but it looks like
> it
> is being discontinued (by sun) in favour of PBP and PP
> http://developers.sun.com/techtopics/mobility/personal/articles/pbp_pp/index
> .html

I had not read this. However, my P800 will still run PersonalJava...

Thanks for the bible. I hope to find some time this week to rewrite your
code in java.