[sword-devel] Module upload: FreLXX

David Haslam dfhdfh at protonmail.com
Mon Jun 11 04:46:37 MST 2018


Further clarification and observations about the SWORD filter for UTF8GreekAccents...

My reply of 7th June was sent before I was informed about the source code for UTF8GreekAccents.
In fact, this does make use of the mapping table that I provided in March 2017. Thanks, Troy!

You can visit the latest version in SVN trunk here.
https://crosswire.org/svn/sword/trunk/src/modules/filters/utf8greekaccents.cpp

Please note that it was patched during the weekend to add the lines to process GREEK KORONIS & COMBINING GREEK KORONIS.
as well as to remove a residual (unused) declaration leftover from the original version. Thanks, Troy.

We may have been wondering why the filter still includes a line to remove the RIGHT SINGLE QUOTATION
      converters[0x2019] = ""; // RIGHT SINGLE QUOTATION MARK

This is because the source text in some older accented Greek modules used this Unicode character.
These are usually found at End of Word locations, with typically 1218 occurrences.
More recent editions of the Greek NT use the GREEK KORONIS 0x1FBD in all these same locations.

Modules with 0x2019 include MorphGNT, TischMorph and 2TGreek.
Modules with 0x1FBD include SBLG_THE.

FIO. The only Greek letters ever followed by the character are typified by the following analysis (extracted from MorphGNT).
Count Pattern
0034 δ’
0107 θ’
0233 τ’
0292 π’
0213 λ’
0132 φ’
0061 ρ’
0149 ι’
The counts vary slightly for different modules.

We should consider the conjecture that the first ever digitisation of (e.g.) the Tischendorf NT was simply transcribed incorrectly.
i.e. 0x2019 was keyed everywhere one would nowadays expect to use a GREEK KORONIS.
Maybe the task was performed between Unicode 1.0 (October 1991) and Unicode 1.1 (June 1993) ?

Aside: It's very likely that digitisation took place before Unicode even existed, and that the text was subsequently converted to Unicode.
Some of you may remember Claremont-Michigan encodings for Hebrew, Aramaic and Greek.

So, rather than being a bug in SWORD, in retrospect it looks more like an accommodation to a systematic transcription error in some NT Greek text sources.
What we should do about it remains an open question.

One new question arises from the changes to the SWORD filter (2017 & 2018).
Has anything similar been done for the equivalent JSword filter?

Best regards.

David

Sent with [ProtonMail](https://protonmail.com) Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On 7 June 2018 8:06 PM, David Haslam <dfhdfh at protonmail.com> wrote:

> This ongoing problem affects far too many module releases.
> The immediate cause is a wrong assumption implemented in the confmaker script.
>
> The UTF8GreekAccents filter does not restrict its filtering to accents joined or adjacent to letters in the Greek alphabet.
> And by "accents" please remember that some of these are actually Unicode punctuation marks.
> It applies the filter "willy-nilly" no matter what the context in terms of language, script or alphabet.
> It's a one-way valve that should never be used "backwards" to determine whether or not it should be present in the .conf file.
>
> Aside: The other UTF8 filters are not like this, so it's OK for confmaker to use them for testing to see if they are required.
>
> The set of Unicode characters filtered by UTF8GreekAccents are not unique to the Koine Greek language.
> Some of them are found in many other languages.
>
> It's theoretically feasible to redesign the filter such that it applies only in the context of Greek letters.
> So yes, this is a matter for SWORD developers to consider too.
> I documented a suitable mapping table in my GitHub repo in March 2017. See
> https://github.com/DavidHaslam/UTF8-Greek-Accents
>
> It was discussed in this mailing list at the time.
> Troy was unwilling to replace the existing filter on the grounds that it does what it was designed for on accented Greek modules.
> The point is this. It was never designed to be used in general to test whether it is needed by a module.
> When used for this unintended "backwards" purpose, it generally gives the wrong answer.
>
> This concept is not difficult to understand.
>
> Unless and until the filter itself is redesigned, we need a compromise workaround for the confmaker script.
> My suggestion is to restrict applying this "backwards" test to only the modules in which this line is present.
>
> Lang=grc
>
> This would largely prevent the ongoing spurious addition of this filter due to the automation of module publishing.
> One can imagine there may be corner cases, such as where (e.g.) a French Bible module had study notes which included some accented Greek words.
> But the impact would be minimal by not having the filter in the conf file in such rare cases.
>
> Best regards,
>
> David
>
> Sent with [ProtonMail](https://protonmail.com) Secure Email.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On 7 June 2018 7:25 PM, DM Smith <dmsmith at crosswire.org> wrote:
>
>> I think it is a bug in the SWORD engine if single right quotation mark is seen as a Greek diacritic.
>>
>> Will look later to verify.
>>
>> If it is then the module should not have the option.
>>
>> — DM Smith
>>
>> On Jun 7, 2018, at 8:54 AM, "refdoc at gmx.net" <refdoc at gmx.net> wrote:
>>
>>> If a Greek accent is in use, the filter will be there. If this is a bug, I.e. there should not be a Greek accent, please highlight this at source. I guess this is the right approach here too. Then the next iteration will not have a spurious filter
>>>
>>> Sent from my mobile. Please forgive shortness, typos and weird autocorrects.
>>>
>>> -------- Original Message --------
>>> Subject: Re: [sword-devel] Module upload: FreLXX
>>> From: David Haslam
>>> To: SWORD Developers' Collaboration Forum
>>> CC:
>>>
>>>> This line in frelxx.conf is superfluous:
>>>>
>>>> GlobalOptionFilter=UTF8GreekAccents
>>>>
>>>> I think it's triggered in confmaker script by the presence of these characters.
>>>> U+2019 ’ 656 RIGHT SINGLE QUOTATION MARK
>>>>
>>>> NB. The source text is inconsistent in which character is used for the typographical apostrophe. cf.
>>>> U+0027 ' 39,200 APOSTROPHE
>>>>
>>>> Example:
>>>> Exodus 3:13 contains "les fils d'Israël" (character U+0027 used)
>>>> Exodus 3:15 contains "aux fils d’Israël" (character U+2019 used)
>>>>
>>>> When the Greek Accents filter is disabled (in Xiphos) the latter becomes "aux fils dIsraël" (without the apostrophe).
>>>>
>>>> There are no Greek letters in the module, so the GreekAccents filter should not be included.
>>>>
>>>> Best regards,
>>>>
>>>> David
>>>>
>>>> Sent with ProtonMail Secure Email.
>>>>
>>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>>
>>>> On 4 June 2018 7:38 AM, wrote:
>>>>
>>>>> Dear All,
>>>>>
>>>>> This is to announce that we have just now uploaded FreLXX.
>>>>>
>>>>> This is is an updated version of FreLXX.
>>>>>
>>>>> Many thanks to update for the hard work.
>>>>>
>>>>> yours
>>>>>
>>>>> The Module Team
>>>>>
>>>>> P.S.: This email is sent automatically on upload of a new/updated module
>>>>>
>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>>
>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>
>>>>> Instructions to unsubscribe/change your settings at above page
>>>>
>>>> _______________________________________________
>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>> Instructions to unsubscribe/change your settings at above page
>>
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20180611/5cdb7c9b/attachment-0001.html>


More information about the sword-devel mailing list