[sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules
David Haslam
dfhmch at googlemail.com
Tue Feb 21 12:05:00 MST 2017
A further idiosyncrasy of the UTF8GreekAccents filter that proves to be an
interesting clue:
It changes U+00BE VULGAR FRACTION THREE QUARTERS ¾ to ordinary 3/4.
Vulgar fractions are about as far as you can get from Koine Greek, nicht
wahr?
This is what I think this proves:
It must first decompose the Unicode to either NFKD or NFKC as a prelude to
removing the "accents".
These are two of the four canonical normalization forms in Unicode.
It should in theory then renormalize to NFC after the Greek accents have
been removed.
Without the diacritics, this wouldn't be needed unless some non-Greek
composite characters had also been present in the original module text.
This particular example is of significance in that once you've got "3/4" no
amount of renormalization to NFC would change it back to the special Unicode
vulgar fraction ¾.
Some aspects of Unicode normalization cannot be reversed.
Who'd have thought that my suggestion to use this vulgar fraction character
in one single verse of a Punjabi Bible could later prove to be useful
evidence in the case for the prosecution?
II Chronicles 1:17: ਸੁਲੇਮਾਨ ਦੇ ਵਿਉਪਾਰੀ ਮਿਸਰ ਤੋਂ ਇੱਕ ਰੱਥ ਚਾਂਦੀ ਦੇ 15 ਪੌਂਡ ਦਾ
ਅਤੇ ਇੱਕ ਘੋੜਾ ਚਾਂਦੀ ਦੇ 3¾ ਪੌਂਡ ਦਾ ਖਰੀਦਦੇ ਸਨ । ਫ਼ੇਰ ਉਨ੍ਹਾਂ ਨੇ ਇਹ ਘੋੜੇ ਅਤੇ ਰੱਥ
ਹਿੱਤੀ ਲੋਕਾਂ ਦੇ ਰਾਜਿਆਂ ਅਤੇ ਆਰਾਮ ਦੇ ਰਾਜਿਆਂ ਨੂੰ ਵੇਚ ਦਿੱਤੇ ।
Well there we are, you see. :)
David
--
View this message in context: http://sword-dev.350566.n4.nabble.com/GlobalOptionFilter-UTF8GreekAccents-and-non-Greek-modules-tp4656719p4656747.html
Sent from the SWORD Dev mailing list archive at Nabble.com.
More information about the sword-devel
mailing list