[sword-devel] Unicode
Martin Gruner
sword-devel@crosswire.org
Sun, 27 May 2001 19:35:01 +0200
> 1. Encode with UTF-8 whenever possible. (Probably a bad idea.)
> 2. Encode with ISO8859-1 (Latin-1) whenever possible and then UTF-8
> whenever possible if ISO8859-1 won't work, which alleviates the problem
> of accents & umlauts increasing in size.
> 3. Encode with all ISO8859 encodings and similar 8/16-bit encodings
> whenever possible, using UTF-8 as a fallback when possible, which
> alleviates many more module size problems.
>
> The question is how much processing we are willing to do in Sword to
> convert between encodings vs. how large we are willing to allow our
> modules to become. One thing we have in our favor is that all of these
> modules can be targeted at Sword 1.5+, so we can compress them. But a
> compressed UTF-8 NA27 is still going to be larger than a NA27 encoded in
> ISO8859-7.
A compressed NA27-iso8859-7 is the best.
My proposal:
-store the modules in whatever encoding you like
-for every encoding, write a encoding->unicode filter and a unicode->encoding
filter
-handle all strings as unicode internally.
-let the frontend decide which encoding to use for output (e.g. iso-8859-7
vs. UTF-8)
> The nicest solution may be to allow flexibility for module makers and
> frontend makers by supporting texts encoded in UTF-8, ISO8859-x, etc.
> and translating to the desired encoding, just as we do with different
> markup filters.
Yes. As long as standardized encodings are used which are not dependent upon
a special font.
> There's a further issue of Unicode's incompleteness. Harry has
> mentioned there are still some issues with Hebrew support in Unicode
> 3.0. There are very few fonts even made to support some of the new
> glyphs in Unicode 3.0. As an example, while making a Peshitta module
> last night, I wanted to convert from a custom font encoding over to
> UTF-8. Syriac was only added in Unicode 3.0, so I only found one font
> that supports its glyphs. Even so, it appears that the Syriac
> implementation in Unicode 3.0 may be incomplete for the purposes of this
> text.
Well, this is a serious problem. For modules like this (no proper encoding
available) it might be necessary to use a font specific encoding
(encoding=fontspecific) and a specific font. I guess this will be taken care
of by unicode in future.
> Why does it seem that once we scale a tall mountain, we find an even
> taller mountain waiting behind it to be conquered as well?
Matthew 21:21
And Jesus in answer said to them, Truly I say to you, If you have faith,
without doubting, not only may you do what has been done to the fig-tree, but
even if you say to this mountain, Be taken up and put into the sea, it will
be done. ;)
Martin