[sword-devel] task

Helmer Krämer sword-devel@crosswire.org
Mon, 10 Sep 2001 14:01:10 +0200

On Sun, 9 Sep 2001 16:06:42 -0700
"Chris Little" <chrislit@chiasma.org> wrote:

> *sigh*  Troy is just completely wrong here and I think we need to take
> away his CVS write access for a week or so as punishment. ;)
Boy, I realize I have to be utterly careful, you're pretty harsh ;)

> In addition, SCSU works very well on small strings (unlike ZIP or LZSS)
> and you would only save one or two bytes by grouping verses as we do
> with ZIP & LZSS.  So for that reason, I would actually recommend against
> subclassing SWCompress to do the SCSU driver, and would suggest instead
> subclassing SWFilter and doing a SCSU to UTF-8 or UTF-16 filter (using
> the ICU macros that I still haven't committed).

If encoding only one entry at a time isn't a problem, writing a SWFilter
is the correct approach, IMHO (since the filters of a module all process
only one entry at a time). To make this more robust, I'd propose to modify
SWFilter::ProcessText to return a buffer with the processed text rather
than using the passed in buffer for the result, since we might get some
problems otherwise.....  
> I know we were sort of targetting doing all modules in UTF-8, but I
> think there's a case to be made for using other Unicode encodings, at
> least when they save space.
I think having different encodings for data stored on disk and data that
is passed to the app is the correct way, since the app programmer wants
some encoding that's easy to use (probably UTF-8), but we want some encoding
that saves as much space as possible. Moreover, we can always add the 
appropriate filters to convert from one encoding to the other (maybe even
let the app programmer specify the final encoding, though this may cause
problems with the other filters of a module since they need to be able to
work with the final encoding).