[sword-devel] unicode / utf-8

Martin Gruner sword-devel@crosswire.org
Fri, 25 May 2001 14:18:23 +0200

> > > Lot's of things to consider over the next few weeks as we try to hash
> > > out an initial shot at supporting this new range of modules.
> If Martin talks us into using iso8859 and other 8/16-bit encodings to save
> space, there are some very nice conversion tables at
> http://www.unicode.org/Public/MAPPINGS/.  And it might be nice to provide
> mechanisms for this to aid front-ends that have no hope of Unicode support.

Well. Troy's comments on UTF-8 were really delighting for me, I didn'd now 
that UTF-8 enables storing with variable length sizes, and therefore is not 
blowing up most of the modules.
So I suggest using UTF-8 for _all_ of the sword modules instead of using 
iso8859-x etc. Sword could handle the characters as unsigned long internally 
which may be easier to handle than variable length characters.
Using fixed sized chars internally will make the handling much more simple. 
We could still support modules with different encodings, which would be 
mapped into unicode internally. And there yould be output routines which 
convert the unicode chars to a frontend specific encoding, say iso8859-1 for 
irenaeus in the western locale.
This would increase the usability and efficiency of sword a lot.