[sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules
Peter Von Kaehne
refdoc at gmx.net
Tue Feb 21 03:10:05 MST 2017
Thanks David and Troy,
What is happening is - my script tests for presence of Greek accents by doing a before-and-after comparison using a Greek accent strip filter. This works beautifully for the Hebrew stuff - vowels and breathing marks. It should work for the Greek accent filter. It does not.
The script is under sword-tools/modules/conf/confmaker.pl. Right now the Greek accents' option has been commented out, so please have a look at the version svn-head-1.
I do not think I use the filter wrong in my script, though of course I am keen to hear about any mistakes in my use.
I have noted this a year or two ago and made a remark on the mailing list. I simply left my script as it was as it seemed correct and the problem was with the library to the best of my understanding.
Peter
> Gesendet: Dienstag, 21. Februar 2017 um 09:04 Uhr
> Von: "David Haslam" <dfhmch at googlemail.com>
> An: sword-devel at crosswire.org
> Betreff: Re: [sword-devel] GlobalOptionFilter=UTF8GreekAccents and non-Greek modules
>
> Hi Troy,
>
> Surely there's no doubt the module source text was correctly encoded as
> UTF-8 and normalised to NFC?
>
> We can examine the output of mod2imp and see that it is. Or am I missing
> something?
>
> mod2imp doesn't change the normalisation form, and I assume it doesn't
> change the encoding either.
>
> CzeCEP is not only recent module to which the script has added the
> GlobalOptionFilter=UTF8GreekAccents.
> FinRK was released yesterday and suffers the same issue.
>
> What I think has happened is this:
>
> The Greek Accents filter was probably never adequately beta tested.
>
> It was accepted after only being alpha tested, to see that it does remove
> Greek accents from Greek text that has some.
>
> Nobody thought to check whether it did anything untoward on the UTF-8
> encoded text in a variety of non-Greek scripts. The bug has gone undetected
> until yesterday. It's either a very old bug, or a library has changed
> without anyone noticing.
>
> I understand that the Module Team's script does the following as part of the
> automation to build the module conf file:
>
> It applies this filter, checks for change, then adds the filter line to the
> conf file if a change was detected.
>
> Knowing this, it's not hard to see how we have ended up with a spurious
> Greek Accents filter in some recently released modules, is it?
>
> The mopping up containment action is to determine how many modules have been
> released with the spurious filter in the configuration file? These must each
> be corrected by removing the line, updating the version and date, and
> releasing the update.
>
> The permanent solution should be to find out exactly how this filter works
> in detail, and rewrite it if necessary. That would require an update to
> SWORD as a significant bug fix.
>
> The most recent mention of this filter in SWORD releases was under 1.5.10
> dated 20-Nov-2006 in which you added a further Greek accent. In fact, that's
> the only explicit mention. The string "utf8" appears earlier a few times,
> but in a more general sense.
>
> NB. Using diatheke version 4.7, I have thoroughly tested CzeCEP for the
> four other UTF8 filters. Only GreekAccents is delinquent.
>
> Best regards,
>
> David
>
> PS. If only CrossWire had a "bug bounty" scheme.... Ah, but we're a
> "non-income" organization.
> Looking only to the heavenly reward, and the fruit of the Gospel here in
> earth. :)
>
>
>
>
>
> --
> View this message in context: http://sword-dev.350566.n4.nabble.com/GlobalOptionFilter-UTF8GreekAccents-and-non-Greek-modules-tp4656719p4656729.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
More information about the sword-devel
mailing list