[sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

David Haslam dfhmch at googlemail.com
Wed Feb 22 08:09:55 MST 2017


A similar test on my Panjabi module (still WIP) but using

GlobalOptionFilter=UTF8HebrewPoints

demonstrated that there was no difference made to the output.

The fact that vulgar fraction ¾ was left unaltered is proof that this filter
does not use Unicode Normalisation to decompose combined characters. 

No surprises in this!

Hebrew diacritics do not generally occur in our modules except as separate
combining characters.
If applied before or during module build, normalization merely changes the
order of Hebrew diacritics.

The exceptions to this are the composite Hebrew characters in the range
U+FB1D to U+FB4F.

These composite characters would be left unchanged by our two UTF8 filters
made for Biblical Hebrew.

GlobalOptionFilter=UTF8HebrewPoints
GlobalOptionFilter=UTF8Cantillation

This is merely an observation. I do not advocate any change to these
filters.
It's unlikely that any of the combined characters will be present in our
modules.
Most of these are akin to what in other scripts would be called Presentation
Forms.

btw. One character in this range is a Yiddish ligature!
U+FB1F HEBREW LIGATURE YIDDISH YOD YOD PATAH

Best regards,

David

PS. "Unicode normalization can break Biblical Hebrew." (but that's another
matter!)










--
View this message in context: http://sword-dev.350566.n4.nabble.com/GlobalOptionFilter-UTF8GreekAccents-and-non-Greek-modules-tp4656719p4656798.html
Sent from the SWORD Dev mailing list archive at Nabble.com.



More information about the sword-devel mailing list