[sword-devel] Transliteration

Chris Little sword-devel@crosswire.org
Fri, 26 Sep 2003 18:19:48 -0700


I'm revamping our transliteration code and would appreciate the opinions
of others on what we should offer/present to users.

The current release is designed to convert any script into Latin or
Latin-1.  Additionally, Greek can be transliterated into BGreek format,
and Greek, Coptic, and Hebrew can be transliterated into Beta format.
(When Beta or BGreek is selected, all other scripts are converted into
Latin also.)

The original implementation permitted transliteration into any of about
27 scripts aside from Latin, but the whole text had to be transliterated
into the same script.  This is partially useful and partially not.  E.g.
Russian users might find it beneficial to have Greek to Cyrillic
transliteration, but I can't see anyone wanting Thai to Ugaritic.  Most
of these transliterations between two non-Latin scripts are inaccurate,
too, since they use Latin as an intermediate script (i.e. Thai to
Ugaritic is really Thai to Latin followed by Latin to Ugaritic.)

For comparison with the only similar product that offers
transliteration, Logos (Series X version 2.1 Alpha 3) permits
transliteration from any of about 10 scripts to Latin to be toggled
individually.  In other words, you can transliterate Greek to Latin but
leave Hebrew unchanged.  Additionally, with Greek, Hebrew, & Syriac, the
user can select from a variety of different transliteration formats
individually.  For example, Greek can be transliterated in SBL format,
while Hebrew is transliterated in Beta format, while Syriac is
transliterated in Hugoye format.

Part of the improvements to our transliteration implementation has been
the addition of about 40 new transliterators (so far).  The bulk of
these are variant transliterators that follow various standards (SBL,
UNGEGN, ALA-LC, ISO, etc.).

Logos' user settings are presented in a pair of dialog boxes.  Ours 
through the GlobalOptionFilter mechanism, which means they're a menu 
item in BibleCS.

I'm concerned that users will really be incapable of understanding how 
their preferences operate within our transliteration system.  When they 
view a Syriac text and select SBL transliteration, they won't understand 
that it falls back on the default Latin transliteration because no SBL 
variant exists.

So, my thought is to separate the single transliterator option we 
currently have into script-specific options, e.g. one for Greek, one for 
Hebrew, one for Gothic, etc.

Does that sound good or bad or does anyone have a better idea?

I posted a test version of BibleCS with the latest transliteration code 
and a new ICU dll at 
http://www.crosswire.org/sword/ALPHAcckswwlkrfre22034820285912/alpha/sword-icutest.zip
Expect bugs, I'm still debugging many of the transliterators.

--Chris