[sword-devel] Greek dictionary - input needed

Tue Jan 20 22:29:18 MST 2009

Ben Morgan wrote:
> On Wed, Jan 21, 2009 at 12:41 AM, DM Smith <dmsmith555 at yahoo.com 
> <mailto:dmsmith555 at yahoo.com>> wrote:
> 
> 
>     ICU has the notion of a collation key, which can be used for such a
>     purpose. (I think we've gotten to the point where ICU is a
>     requirement for UTF-8 modules.) In ICU, the collation key is locale
>     dependent. (For example, Germans sort accent marks differently than
>     French. In Spanish dictionaries, at least older ones, ch come before
>     ca.) I really don't see any way around having a static collation for
>     a module. If so, the collation would need to be fixed wrt either a
>     fixed locale or a locale based upon the language of the module.

DM's suggestion (not merely the part pertaining to ICU) sounds good to 
me. It does represent a rather radical change since it's a proposal for 
a whole new driver type, but that might be what we need in order to get 
the kind of flexibility we need going forward.

> ICU is not a requirement for using UTF-8 modules; rather than use ICU, 
> most frontends (certainly BPBible, GnomeSword, BibleTime and I think 
> MacSword as well) have defined their own string manager code (generally 
> using the platform - qt, glib or python).

DM is really correct that we're coming to the point where ICU is going 
to be a necessity for app i18n/l10n. ICU provides up-to-date collation 
and normalization facilities that are a necessity for correctly managing 
Unicode data in anything other than a braindead manner (like our 
byte-ordered LD entries currently are). Searching, including functions 
like accent normalization and correct case folding, aren't possible 
without certain level of Unicode knowledge within the app. And when we 
actually think about doing lookup via transliteration (something every 
other piece of professional Bible software handles) we can either go to 
the effort of rolling our own transliteration facility or use the 
ready-made one provided in ICU (as Logos does).

MacSword may be exempt from needing ICU for a while, as would any other 
MacOS or iPhone program, for the simple fact that many of ICUs 
functionality should be available through platform APIs. That's because 
Apple has included ICU on both of these platforms, though it won't ever 
be the most recent release and may lack some data.

> Personally, BPBible doesn't use ICU for two reasons - the extra size for 
> ICU and the transliterators it supplies. When compiling with ICU, it 
> adds transliteration filters, which are really buggy - crashes, mixed up 
> xml, etc.

The extra download size added by ICU data is 3mb, less than the size of 
2 Bibles. In 2009, I can't see anyone complaining about a 3mb increase 
in download size. Even PDAs and cell phones are shipping with gigs of 
memory.

Regarding stability of the transliterators, I've just disabled all but 
the primary Latin transliterators, which should eliminate most problems. 
If problems remain, please let us know (preferably via the bug tracker). 
We can add some of the other Latin-oriented transliterators back at a 
later date, once we've checked them and established their stability.

Put simply, complete i18n and l10n of Sword and Sword frontends aren't 
within our reach without ICU.

--Chris