[sword-devel] Algorithmic transliteration by ICU and Armenian punctuation marks
David Haslam
dfhmch at googlemail.com
Sun Mar 5 11:20:09 MST 2017
This is just to record an observation I made when tinkering with diatheke the
other day.
Algorithmic transliteration by ICU does not replace Armenian punctuation
marks.
To illustrate, the character counts of these remain unchanged compared to
the original text.
U+055B ՛ 11,553 ARMENIAN EMPHASIS MARK
U+055C ՜ 449 ARMENIAN EXCLAMATION MARK
U+055D ՝ 70,737 ARMENIAN COMMA
U+055E ՞ 3,522 ARMENIAN QUESTION MARK
U+0589 ։ 30,366 ARMENIAN FULL STOP
U+058A ֊ 1,126 ARMENIAN HYPHEN
U+2024 ․ 6 ONE DOT LEADER (used as the Armenian semicolon)
Clearly this is an upstream issue for ICU, but as I'm layers removed from
that, I thought it worthwhile at least recording here, on the off chance
that someone more involved might care to take it up.
cf. For comparison, I recently did a similar exercise with the Gurmukhi
script (for the Punjabi language), and was very pleased to observe that
Gurmukhi punctuation marks were suitably replaced by those we use in English
and other Latin script languages.
As for what replacements should be used, most of these are obvious from the
Unicode character names.
The only one that needs a more well informed choice is the first, what to
use for the emphasis mark.
This being a phonetic construct, my own suggestion would be to use
U+02B9 ʹ MODIFIER LETTER PRIME
though there may turn out to be something more appropriate.
Best regards,
David
--
View this message in context: http://sword-dev.350566.n4.nabble.com/Algorithmic-transliteration-by-ICU-and-Armenian-punctuation-marks-tp4656909.html
Sent from the SWORD Dev mailing list archive at Nabble.com.
More information about the sword-devel
mailing list