[sword-devel] What character encoding should I use.
Troy A. Griffitts
sword-devel@crosswire.org
Tue, 30 Jul 2002 21:26:27 -0700
Steve,
Welcome! Great to have you!
> I just wanted to announce myself to the list. My main interests for this
> project are in localization (don't let my name fool you, I speak fluent
> Vietnamese, and I'm getting started on Croatian). I also might be able to
> help out with the "writing" stuff. I don't know if I'll be able to help with
> any technical docs, but I'd be happy to contact publishers for licenses to
> distribute and all.
That would be awesome! The ability to speak with non-English Bible
copyright holders is not an ability that I, personally, have.
> I have a question about the localization stuff. Which encoding are we to use
> when we write the localization files?
Well, currently that all depends on the locale. For the uilocale for
the windows frontend, it would be whatever works for the version of
windows that you use. We've had Thai and Chinese work, somehow, in some
encoding, and I would have never thought that was possible :) So, I
would suggest starting out translating a single entry in your windows
locale file, running the program, and see if you get it display correctly.
We're planning to soon include an Encoding= entry for the locale files.
This will allow differing files for different usages, and possible
encoding manager to convert for different purposes. That would require
us knowing what Windows (or other application) want for each platform
and version. Currently the locale writer can decide what is best/works
for his application.
For the engine's locale, including booknames and definitions, we
currently only support latin based languages well. We use a simple
toupper function that has no logic of computer locale or different rules
for performing proper toupper functionality except for a handful of
latin languages. We can switch this easily to utf8, but that would
currently require all other locales being rewritten, and also would
require a dependency of a library called ICU, which we can currently
compile with or without. If we change the locales to utf8, they will
not work correctly without icu. If we leave them, as it, we will be
able to support other languages well. The solution posed thus far is to
include the Encoding= entry to allow locales that can work without ICU
to continue to NOT be in utf8. And new locales can be written in utf8
for other languages, and that can be used if icu is present. This gives
us the best of both worlds. Hope this helps.
All that to say. Try playing around. Try looking at some of the other
15 or so locales that have been submitted. Let me know what you find to
work for your application, and submit what works. Thanks for your
willingness to help!
-Troy.