[sword-devel] What character encoding should I use.

Troy A. Griffitts sword-devel@crosswire.org
Tue, 30 Jul 2002 21:26:27 -0700


Steve,
	Welcome!  Great to have you!

> I just wanted to announce myself to the list.  My main interests for this 
> project are in localization (don't let my name fool you, I speak fluent 
> Vietnamese, and I'm getting started on Croatian).  I also might be able to 
> help out with the "writing" stuff.  I don't know if I'll be able to help with 
> any technical docs, but I'd be happy to contact publishers for licenses to 
> distribute and all.

That would be awesome!  The ability to speak with non-English Bible 
copyright holders is not an ability that I, personally, have.


> I have a question about the localization stuff.  Which encoding are we to use 
> when we write the localization files?

Well, currently that all depends on the locale.  For the uilocale for 
the windows frontend, it would be whatever works for the version of 
windows that you use.  We've had Thai and Chinese work, somehow, in some 
encoding, and I would have never thought that was possible :)  So, I 
would suggest starting out translating a single entry in your windows 
locale file, running the program, and see if you get it display correctly.

We're planning to soon include an Encoding= entry for the locale files. 
  This will allow differing files for different usages, and possible 
encoding manager to convert for different purposes.  That would require 
us knowing what Windows (or other application) want for each platform 
and version.  Currently the locale writer can decide what is best/works 
for his application.

For the engine's locale, including booknames and definitions, we 
currently only support latin based languages well.  We use a simple 
toupper function that has no logic of computer locale or different rules 
for performing proper toupper functionality except for a handful of 
latin languages.  We can switch this easily to utf8, but that would 
currently require all other locales being rewritten, and also would 
require a dependency of a library called ICU, which we can currently 
compile with or without.  If we change the locales to utf8, they will 
not work correctly without icu.  If we leave them, as it, we will be 
able to support other languages well.  The solution posed thus far is to 
include the Encoding= entry to allow locales that can work without ICU 
to continue to NOT be in utf8.  And new locales can be written in utf8 
for other languages, and that can be used if icu is present.  This gives 
us the best of both worlds.  Hope this helps.

All that to say.  Try playing around.  Try looking at some of the other 
15 or so locales that have been submitted.  Let me know what you find to 
work for your application, and submit what works.  Thanks for your 
willingness to help!

	-Troy.