[sword-devel] Module upload: FreLXX

David Haslam dfhdfh at protonmail.com
Mon Jun 11 07:49:49 MST 2018


Further to my earlier reply today, observe that the section of code where the RIGHT SINGLE QUOTATION MARK is removed is not 100% specific to the Greek language.

//first just remove combining characters
converters[0x2019] = ""; // RIGHT SINGLE QUOTATION MARK
converters[0x1FBF] = ""; // GREEK PSILI
converters[0x2CFF] = ""; // COPTIC MORPHOLOGICAL DIVIDER
converters[0xFE24] = ""; // COMBINING MACRON LEFT HALF
converters[0xFE25] = ""; // COMBINING MACRON RIGHT HALF
converters[0xFE26] = ""; // COMBINING CONJOINING MACRON
converters[0x0300] = ""; // COMBINING GRAVE ACCENT
converters[0x0301] = ""; // COMBINING ACUTE ACCENT
converters[0x0302] = ""; // COMBINING CIRCUMFLEX ACCENT
converters[0x0308] = ""; // COMBINING DIAERESIS
converters[0x0313] = ""; // COMBINING COMMA ABOVE
converters[0x0314] = ""; // COMBINING REVERSED COMMA ABOVE
converters[0x037A] = ""; // GREEK YPOGEGRAMMENI
converters[0x0342] = ""; // COMBINING GREEK PERISPOMENI
converters[0x1FBD] = ""; // GREEK KORONIS
converters[0x0343] = ""; // COMBINING GREEK KORONIS

Assuming that UTF-8 is normalized to NFC during module build or earlier,
there are many non-Greek character sets and alphabets where some of the above combining characters can survive the normalization process.
They remain as separate characters in the module text wherever there is no corresponding precomposed character.

For such modules, applying the filter will result in a change.
Hence my concern (expressed in the idiom "willy-nilly") that using the filter to determine whether it should be specified in the .conf file is not always a good idea.

Best regards,

David

Sent with [ProtonMail](https://protonmail.com) Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On 11 June 2018 12:46 PM, David Haslam <dfhdfh at protonmail.com> wrote:

> Further clarification and observations about the SWORD filter for UTF8GreekAccents...
>
> My reply of 7th June was sent before I was informed about the source code for UTF8GreekAccents.
> In fact, this does make use of the mapping table that I provided in March 2017. Thanks, Troy!
>
> You can visit the latest version in SVN trunk here.
> https://crosswire.org/svn/sword/trunk/src/modules/filters/utf8greekaccents.cpp
>
> Please note that it was patched during the weekend to add the lines to process GREEK KORONIS & COMBINING GREEK KORONIS.
> as well as to remove a residual (unused) declaration leftover from the original version. Thanks, Troy.
>
> We may have been wondering why the filter still includes a line to remove the RIGHT SINGLE QUOTATION
>       converters[0x2019] = ""; // RIGHT SINGLE QUOTATION MARK
>
> This is because the source text in some older accented Greek modules used this Unicode character.
> These are usually found at End of Word locations, with typically 1218 occurrences.
> More recent editions of the Greek NT use the GREEK KORONIS 0x1FBD in all these same locations.
>
> Modules with 0x2019 include MorphGNT, TischMorph and 2TGreek.
> Modules with 0x1FBD include SBLG_THE.
>
> FIO. The only Greek letters ever followed by the character are typified by the following analysis (extracted from MorphGNT).
> Count Pattern
> 0034 δ’
> 0107 θ’
> 0233 τ’
> 0292 π’
> 0213 λ’
> 0132 φ’
> 0061 ρ’
> 0149 ι’
> The counts vary slightly for different modules.
>
> We should consider the conjecture that the first ever digitisation of (e.g.) the Tischendorf NT was simply transcribed incorrectly.
> i.e. 0x2019 was keyed everywhere one would nowadays expect to use a GREEK KORONIS.
> Maybe the task was performed between Unicode 1.0 (October 1991) and Unicode 1.1 (June 1993) ?
>
> Aside: It's very likely that digitisation took place before Unicode even existed, and that the text was subsequently converted to Unicode.
> Some of you may remember Claremont-Michigan encodings for Hebrew, Aramaic and Greek.
>
> So, rather than being a bug in SWORD, in retrospect it looks more like an accommodation to a systematic transcription error in some NT Greek text sources.
> What we should do about it remains an open question.
>
> One new question arises from the changes to the SWORD filter (2017 & 2018).
> Has anything similar been done for the equivalent JSword filter?
>
> Best regards.
>
> David
>
> Sent with [ProtonMail](https://protonmail.com) Secure Email.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On 7 June 2018 8:06 PM, David Haslam <dfhdfh at protonmail.com> wrote:
>
>> This ongoing problem affects far too many module releases.
>> The immediate cause is a wrong assumption implemented in the confmaker script.
>>
>> The UTF8GreekAccents filter does not restrict its filtering to accents joined or adjacent to letters in the Greek alphabet.
>> And by "accents" please remember that some of these are actually Unicode punctuation marks.
>> It applies the filter "willy-nilly" no matter what the context in terms of language, script or alphabet.
>> It's a one-way valve that should never be used "backwards" to determine whether or not it should be present in the .conf file.
>>
>> Aside: The other UTF8 filters are not like this, so it's OK for confmaker to use them for testing to see if they are required.
>>
>> The set of Unicode characters filtered by UTF8GreekAccents are not unique to the Koine Greek language.
>> Some of them are found in many other languages.
>>
>> It's theoretically feasible to redesign the filter such that it applies only in the context of Greek letters.
>> So yes, this is a matter for SWORD developers to consider too.
>> I documented a suitable mapping table in my GitHub repo in March 2017. See
>> https://github.com/DavidHaslam/UTF8-Greek-Accents
>>
>> It was discussed in this mailing list at the time.
>> Troy was unwilling to replace the existing filter on the grounds that it does what it was designed for on accented Greek modules.
>> The point is this. It was never designed to be used in general to test whether it is needed by a module.
>> When used for this unintended "backwards" purpose, it generally gives the wrong answer.
>>
>> This concept is not difficult to understand.
>>
>> Unless and until the filter itself is redesigned, we need a compromise workaround for the confmaker script.
>> My suggestion is to restrict applying this "backwards" test to only the modules in which this line is present.
>>
>> Lang=grc
>>
>> This would largely prevent the ongoing spurious addition of this filter due to the automation of module publishing.
>> One can imagine there may be corner cases, such as where (e.g.) a French Bible module had study notes which included some accented Greek words.
>> But the impact would be minimal by not having the filter in the conf file in such rare cases.
>>
>> Best regards,
>>
>> David
>>
>> Sent with [ProtonMail](https://protonmail.com) Secure Email.
>>
>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>> On 7 June 2018 7:25 PM, DM Smith <dmsmith at crosswire.org> wrote:
>>
>>> I think it is a bug in the SWORD engine if single right quotation mark is seen as a Greek diacritic.
>>>
>>> Will look later to verify.
>>>
>>> If it is then the module should not have the option.
>>>
>>> — DM Smith
>>>
>>> On Jun 7, 2018, at 8:54 AM, "refdoc at gmx.net" <refdoc at gmx.net> wrote:
>>>
>>>> If a Greek accent is in use, the filter will be there. If this is a bug, I.e. there should not be a Greek accent, please highlight this at source. I guess this is the right approach here too. Then the next iteration will not have a spurious filter
>>>>
>>>> Sent from my mobile. Please forgive shortness, typos and weird autocorrects.
>>>>
>>>> -------- Original Message --------
>>>> Subject: Re: [sword-devel] Module upload: FreLXX
>>>> From: David Haslam
>>>> To: SWORD Developers' Collaboration Forum
>>>> CC:
>>>>
>>>>> This line in frelxx.conf is superfluous:
>>>>>
>>>>> GlobalOptionFilter=UTF8GreekAccents
>>>>>
>>>>> I think it's triggered in confmaker script by the presence of these characters.
>>>>> U+2019 ’ 656 RIGHT SINGLE QUOTATION MARK
>>>>>
>>>>> NB. The source text is inconsistent in which character is used for the typographical apostrophe. cf.
>>>>> U+0027 ' 39,200 APOSTROPHE
>>>>>
>>>>> Example:
>>>>> Exodus 3:13 contains "les fils d'Israël" (character U+0027 used)
>>>>> Exodus 3:15 contains "aux fils d’Israël" (character U+2019 used)
>>>>>
>>>>> When the Greek Accents filter is disabled (in Xiphos) the latter becomes "aux fils dIsraël" (without the apostrophe).
>>>>>
>>>>> There are no Greek letters in the module, so the GreekAccents filter should not be included.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> David
>>>>>
>>>>> Sent with ProtonMail Secure Email.
>>>>>
>>>>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>>>>>
>>>>> On 4 June 2018 7:38 AM, wrote:
>>>>>
>>>>>> Dear All,
>>>>>>
>>>>>> This is to announce that we have just now uploaded FreLXX.
>>>>>>
>>>>>> This is is an updated version of FreLXX.
>>>>>>
>>>>>> Many thanks to update for the hard work.
>>>>>>
>>>>>> yours
>>>>>>
>>>>>> The Module Team
>>>>>>
>>>>>> P.S.: This email is sent automatically on upload of a new/updated module
>>>>>>
>>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>>>
>>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>>>
>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>
>>>>> _______________________________________________
>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>>> Instructions to unsubscribe/change your settings at above page
>>>
>>>> _______________________________________________
>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20180611/f6108c2a/attachment-0001.html>


More information about the sword-devel mailing list