[bt-devel] RE: UTF-8 and new module classes
Martin Gruner
bt-devel@crosswire.org
Thu, 24 May 2001 19:17:39 +0200
Hi Joachim,
> I think UTF-8 is a standard. Wouldn't it be better to have all modules
> available in UTF-8 so all the fonts problems go away?
Yes and no. UTF-8 is just not necessary for the majority of modules. They
will use twice the size since each character is 2 Byte. And there might be
frontends which will not be able to display unicode at all. (e.g. irenaeus)
But: If the modules are encoded with the correct language specific encodings
they are still 1 Byte, and it is just very easy to map these encodings into
the UTF-8 unicode encoding. So we could internally work with unicode while
other apps do not have to, and the modules are still small.
The point is that the modules should be rebuilt using those iso8859-x
encodings, which is _much_ better than just encoding with some fontspecific
ascii encoding, which we can not map into unicode.
I wonder how searching in unicode modules works. Does sword now internally
use unicode?
Martin
> > I favor moving from the font= tag to an encoding= tag. This way we'd not
> > have to use huge fonts, but still the flexibility to let the user choose
> > his/her font. E.g. encoding=iso8859-7 would define greek text. You can
> > then just display this text with a 1 Byte iso8859-7 font or map it into
> > unicode for different purposes.
> > IMO using standards is always a good way to go.
> > We could implement some mapping filters in sword which map from
> > fontspecific ascii encodings to the correct language specific encodings
> > (Like a bstgreek2iso8859-7 filter) to also support frontends favoring the
> > font= solution.
> >
> > Some good links I want to recommend to you:
> > http://czyborra.com/
> > http://czyborra.com/charsets/iso8859.html
> > http://czyborra.com/charsets/cyrillic.html
> >
> > Martin