[sword-devel] HowTo: create ztext module?
DM Smith
dmsmith555 at yahoo.com
Tue May 9 05:07:36 MST 2006
On May 8, 2006, at 8:36 PM, Greg Hellings wrote:
>
> This brings up another interesting question (in my opinion). Why are
> there several standard modules which are distributed without
> compression? Things like the the ASV, the Vulgate and the WEB are all
> distributed in uncompressed format. Might it be beneficial for us to
> zip those up (especially the ASV and WEB, which I would imagine are
> both popular modules?) and distribute them in a ztext format? Are
> there any advantages to having them in rawtext rather than ztext,
> except for minor performance advantages? Just curious!
In my opinion ztext serves two purposes:
1) Since it is not a simple compression of the files (i.e. you can't
run unzip on them) but an internal compression of parts of the file,
it raises the importance of the SWORD api in accessing them,
providing a bit of information hiding.
2) The SWORD api downloads the files individually, and having
compressed files improves download performance.
One may argue that it also reduces the disk footprint, which it does.
But in today's world of large disks (even my old win 95 laptop has a
20G drive), I don't think that is much of an issue.
The drawback is that the client application needs to uncompress the
data on the fly and it does so into memory. Most use book compression
so the entire book needs to be unzipped. Using the principle of
locality, caching the most recent book's uncompression results in
reasonable performance, since most requests for a verse are within
the same book as the last verse requested. The anomalous case is the
returning of a ranked verses for search of a common word. (Ranked
searches is the normal behavior of Lucene).
I can appreciate a goal of encouraging the use of the SWORD api for
access of a module's content, whether it was a deliberate or an
accidental goal, but there are other, simple ways to achieve
information hiding that don't have the performance penalty.
While not simpler, there are ways to compress the files that don't
use stream compression such that each verse can be handled
independently.
With regard to compression of the download file, I think that there
are better ways to handle it as well. Rather than downloading the
parts individually, we could change the installer to download a zip
file of the entire contents.
As a side note, JSword was changed to download the raw zips rather
than the parts because ftp within Java stopped working after a
security patch on Win XP and it did not work behind a proxy on a Mac.
Now we use http tunneling. The side benefit was that the downloads
were faster, more reliable and when they failed, the cleanup was
simpler. Also, the code was a bit simpler as well.
More information about the sword-devel
mailing list