[sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules

David Haslam dfhmch at googlemail.com
Tue Feb 21 02:04:29 MST 2017


Hi Troy,

Surely there's no doubt the module source text was correctly encoded as
UTF-8 and normalised to NFC? 

We can examine the output of mod2imp and see that it is. Or am I missing
something?

mod2imp doesn't change the normalisation form, and I assume it doesn't
change the encoding either.

CzeCEP is not only recent module to which the script has added the
GlobalOptionFilter=UTF8GreekAccents.
FinRK was released yesterday and suffers the same issue.

What I think has happened is this:

The Greek Accents filter was probably never adequately beta tested.

It was accepted after only being alpha tested, to see that it does remove
Greek accents from Greek text  that has some.

Nobody thought to check whether it did anything untoward on the UTF-8
encoded text in a variety of non-Greek scripts.  The bug has gone undetected
until yesterday. It's either a very old bug, or a library has changed
without anyone noticing.

I understand that the Module Team's script does the following as part of the
automation to build the module conf file:

It applies this filter, checks for change, then adds the filter line to the
conf file if a change was detected.

Knowing this, it's not hard to see how we have ended up with a spurious
Greek Accents filter in some recently released modules, is it?

The mopping up containment action is to determine how many modules have been
released with the spurious filter in the configuration file? These must each
be corrected by removing the line, updating the version and date, and
releasing the update.

The permanent solution should be to find out exactly how this filter works
in detail, and rewrite it if necessary. That would require an update to
SWORD as a significant bug fix.

The most recent mention of this filter in SWORD releases was under 1.5.10
dated 20-Nov-2006 in which you added a further Greek accent. In fact, that's
the only explicit mention. The string "utf8" appears earlier a few times,
but in a more general sense.

NB. Using diatheke version 4.7,  I have thoroughly tested CzeCEP for the
four other UTF8 filters. Only GreekAccents is delinquent.

Best regards,

David

PS. If only CrossWire had a "bug bounty" scheme.... Ah, but we're a
"non-income" organization. 
Looking only to the heavenly reward, and the fruit of the Gospel here in
earth. :)





--
View this message in context: http://sword-dev.350566.n4.nabble.com/GlobalOptionFilter-UTF8GreekAccents-and-non-Greek-modules-tp4656719p4656729.html
Sent from the SWORD Dev mailing list archive at Nabble.com.



More information about the sword-devel mailing list