[sword-devel] Algorithmic transliteration by ICU and Armenian punctuation marks

Sun Mar 5 11:20:09 MST 2017

This is just to record an observation I made when tinkering with diatheke the
other day.

Algorithmic transliteration by ICU does not replace Armenian punctuation
marks.

To illustrate, the character counts of these remain unchanged compared to
the original text.

U+055B	՛	11,553	ARMENIAN EMPHASIS MARK
U+055C	՜	449		ARMENIAN EXCLAMATION MARK
U+055D	՝	70,737	ARMENIAN COMMA
U+055E	՞	3,522		ARMENIAN QUESTION MARK
U+0589	։	30,366	ARMENIAN FULL STOP
U+058A	֊	1,126		ARMENIAN HYPHEN
U+2024	․	6		ONE DOT LEADER (used as the Armenian semicolon)

Clearly this is an upstream issue for ICU, but as I'm layers removed from
that, I thought it worthwhile at least recording here, on the off chance
that someone more involved might care to take it up.

cf. For comparison, I recently did a similar exercise with the Gurmukhi
script (for the Punjabi language), and was very pleased to observe that
Gurmukhi punctuation marks were suitably replaced by those we use in English
and other Latin script languages.

As for what replacements should be used, most of these are obvious from the
Unicode character names.
The only one that needs a more well informed choice is the first, what to
use for the emphasis mark. 
This being a phonetic construct, my own suggestion would be to use 
U+02B9	ʹ	MODIFIER LETTER PRIME
though there may turn out to be something more appropriate.

Best regards,

David

--
View this message in context: http://sword-dev.350566.n4.nabble.com/Algorithmic-transliteration-by-ICU-and-Armenian-punctuation-marks-tp4656909.html
Sent from the SWORD Dev mailing list archive at Nabble.com.