[sword-devel] task
Troy A. Griffitts
sword-devel@crosswire.org
Sun, 09 Sep 2001 13:11:56 -0700
> Since I've never commited any code to this project yet, feel free to scold
> me if I'm doing anything wrong ;)
No! Not a all! :)
I realize these are Chris' tasks, but I'm gonna throw in my 2cents worth
anyway :)
> > Ciphering of LD texts--
> > This shouldn't be too difficult since you'd be mimicking functionality
> > in the RawText/RawCom classes in RawLD/RawLD4
> One would have to write some tool to cipher the texts and make
> {RawLD,RawLD4} rawFilter() each read entry, right?
Yes, good to see you spent some time to understand the way things work.
This is exactly what needs to be complete.
> > Ciphering + compression (on LD, Bibles, & commentaries)--
> > Ultimately I'd like ALL texts to be compressed on the site, though I
> > guess we could do a client-side util to uncompress for people who really
> > think the speed improvement is that much more important than disk space
> > lost.
Compression on Lexicons / Dictionaries (LD) is what Bobby Nations is
completing a class to help with.
> > Implement SCSU (de)compression drivers--
> > SCSU is the Standard Compression Scheme for Unicode
> > (http://www.unicode.org/unicode/reports/tr6/), which compresses Unicode
> > streams by using the fact that most characters in a string come from the
> > same code pages and therefore repeat a lot of information. Basically,
> > if you use SCSU and then ZIP the result, you'll get something smaller
> > than either of the compression schemes alone would produce. I'll have
> > SCSU (along with UTF-8/16/32) code from ICU in CVS sometime pretty soon,
> > but it'll still need to be worked into the library.
>
> I'm willing to do it, so here's my idea (just for discussion):
> AFAIK, deciphering is done via rawFilter(). What about doing
> {zlib,SCSU}decompression the same way? We could add a config parameter
> that describes the type of compression of the module data and addRawFilter()
> the relevant filters. Thus we'd have {zlib,SCSU}compression available for
> all type of modules at once.
This was the original way compression was done in the engine.
however...
> This would of course imply that we can only compress one entry at a time,
> probably resulting in bigger compressed files....
This was exactly the result. We really didn't get all that great of
compression on 1 verse buffers. The zVerse class is our
reimplementation. It allows one to pass in the 'block type' (verse,
chaper, book, etc.) and will read and de/compress, then cache in memory
the desired block of text. Obviously, one must choose beforehand which
'block type' is best for a particular module, and it cannot change once
the module is created with the chosen block scheme. Some commentaries
are better optimized (compression and memory considered) with verse
blocks, whereas most Bibles do good at chapter blocks.
The zVerse class' c-tor also takes an SWCompress * to do the compression
work, so, in my opinion, we really need an SWCompress * subclass--
SCSUCompress : public SWCompress that understands the SCSU compression
scheme.
WARNING ("military software detected" :) ): The SWCompress code is
CHEEZY :) Wierd protected buffer that has a few nuances about
allocation or something-- I really don't remember, but I believe they
are documented in the other 2 impls-- ZipCompress, and LZSSCompress.
Helmer, thank you! Looking forward to hearing your comments and working
with you.
-Troy.