[sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules
David Haslam
dfhmch at googlemail.com
Wed Feb 22 08:09:55 MST 2017
A similar test on my Panjabi module (still WIP) but using
GlobalOptionFilter=UTF8HebrewPoints
demonstrated that there was no difference made to the output.
The fact that vulgar fraction ¾ was left unaltered is proof that this filter
does not use Unicode Normalisation to decompose combined characters.
No surprises in this!
Hebrew diacritics do not generally occur in our modules except as separate
combining characters.
If applied before or during module build, normalization merely changes the
order of Hebrew diacritics.
The exceptions to this are the composite Hebrew characters in the range
U+FB1D to U+FB4F.
These composite characters would be left unchanged by our two UTF8 filters
made for Biblical Hebrew.
GlobalOptionFilter=UTF8HebrewPoints
GlobalOptionFilter=UTF8Cantillation
This is merely an observation. I do not advocate any change to these
filters.
It's unlikely that any of the combined characters will be present in our
modules.
Most of these are akin to what in other scripts would be called Presentation
Forms.
btw. One character in this range is a Yiddish ligature!
U+FB1F HEBREW LIGATURE YIDDISH YOD YOD PATAH
Best regards,
David
PS. "Unicode normalization can break Biblical Hebrew." (but that's another
matter!)
--
View this message in context: http://sword-dev.350566.n4.nabble.com/GlobalOptionFilter-UTF8GreekAccents-and-non-Greek-modules-tp4656719p4656798.html
Sent from the SWORD Dev mailing list archive at Nabble.com.
More information about the sword-devel
mailing list