[sword-devel] GlobalOptionFilter=UTF8GreekAccents

David Haslam dfhdfh at protonmail.com
Mon Mar 17 16:21:02 EDT 2025


Here's a creative suggestion for how to use OSIS to encode words with an elision marker.

Make use of the abbr element, thus, eg:

> <abbr expansion="ἀλλά">ἀλλ’</abbr>

Haven't we been discussing abbreviations in another thread? 😎
You read it first here, folks! 😉

Best regards,

David

Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.

On Monday, March 17th, 2025 at 6:46 PM, David Haslam <dfhdfh at protonmail.com> wrote:

> Hi DM,
>
> With Xiphos 4.3.1 (latest update) when I searched TischMorph either for "δι’ ἡμερῶν", or for "δι’ ημερων", there were 2 results:
>
> - Mark 2:1
> - Acts 1:3
>
> Search results were no different with the Greek Accents on or off. I therefore conclude that your hunch was incorrect!
>
> Aside:
>
> - After an exact phrase search, both results preview correctly.
> - After a Lucene fast search, both results preview really [weirdly](https://www.dropbox.com/scl/fi/msw6s8dl4au5z0optwm5l/Screenshot-2025-03-17-18.43.04.png?rlkey=wps1isdrh9h1atdck6r7ihbol&dl=0) & [weirdly](https://www.dropbox.com/scl/fi/4aiyelopdy1a1gjlpto5f/Screenshot-2025-03-17-18.44.12.png?rlkey=bc1qmql18faoti9b6o6o27qeu&dl=0) !!! I think this should be reported to Karl K. Might it be a software bug?
>
> Best regards,
>
> David
>
> Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.
>
> On Monday, March 17th, 2025 at 6:17 PM, DM Smith <dmsmith at crosswire.org> wrote:
>
>> David,
>> I’m not sure that the filter is only used for display. I think it may also be used for search. In Ancient Greek, we don’t want to have to include U+2019 as part of the search request, but just the letters.
>>
>> As a reader of NT Greek, it doesn’t bother me to have δ αρχαια rather than δ’ αρχαια.
>>
>> BTW, if the filter’s code is changed and if the filter is used for searches, then all indexes of accented NT Greek modules will need to be rebuilt. The user’s search request has to be normalized in exactly the same way as the index was constructed.
>>
>> DM
>>
>>> On Mar 17, 2025, at 11:44 AM, David Haslam <dfhdfh at protonmail.com> wrote:
>>>
>>> Hi DM,
>>>
>>> One impact is on the StatResGNT module, in which both single and double left/right quotation marks have been added by the project leader.
>>> Hiding Greek Accents has the bad effect of losing the end quotation mark for all the level 2 quotations in the text.
>>> NB. It was seeing this project that prompted me to revisit this topic.
>>> It would be a real benefit to this module to make the change that I proposed.
>>>
>>> Further to my initial thoughts late last week, I now agree that U+2019 is the right codepoint choice to mark an elision.
>>> I was somewhat misled by the wrong answer given by Leo AI, which mistakenly told me that it was a way to represent the iota subscript.
>>> It's only since quizzing Grok AI that my thoughts have become clear. I admit that I should've known better, but I'm not a classicist.
>>> Yet the "category mistake" still exists - since an elision marker is not a diacritic. And by definition, a Greek Accent is a diacritic!
>>>
>>> Making the proposed change to the filter should have a minimal effect upon all the other Ancient Greek Bible modules.
>>> The number of wordsthus affected in a Greek NT module is not huge!
>>> There's really no downside to still displaying the "typographical apostrophe".
>>>
>>> To illustrate, these are the only 21 words in TischMorph that end with U+2019.
>>>
>>>> Word Count
>>>> Δι’ 2
>>>> Κατ’ 1
>>>> δ’ 22
>>>> δι’ 142
>>>> καθ’ 61
>>>> κατ’ 82
>>>> μεθ’ 43
>>>> μετ’ 132
>>>> μηδ’ 1
>>>> οὐδ’ 8
>>>> παρ’ 59
>>>> τοῦτ’ 17
>>>> ἀλλ’ 220
>>>> ἀνθ’ 5
>>>> ἀπ’ 119
>>>> ἀφ’ 44
>>>> Ἀλλ’ 1
>>>> ἐπ’ 143
>>>> ἐφ’ 82
>>>> ὑπ’ 25
>>>> ὑφ’ 9
>>>
>>> It's now my considered view that even when the Greek accents are hidden by the filter, the elision marks ought to be retained.
>>>
>>> Best regards,
>>>
>>> David
>>>
>>> Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.
>>>
>>> On Monday, March 17th, 2025 at 3:06 PM, DM Smith <dmsmith at crosswire.org> wrote:
>>>
>>>> David, I read your Grok 3 analysis.
>>>>
>>>> What is the impact of not having this change? What is the impact of making the change? Is it merely presentation of is there an issue with searching too?
>>>>
>>>> I’ve also been reading https://corp.unicode.org/pipermail/unicode/2019-January/007563.html which was referenced in a prior recent thread on U+2019 in Ancient Greek. This is long and worth reading to understand how it might impact SWORD. The thread is initiated by James Tauber.
>>>>
>>>> TL;DR:
>>>> U+2019 (and in older texts U+0027) in Ancient Greek was never used for quotations and is only used for elision. It is considered the recommended character for elisions.
>>>> The Unicode rules (when the thread was written in January 2019) of TR29 have that U+2019 is a word break when at the front or end of a word, but not within a word. It is not simply punctuation. These rules are not language aware.
>>>> There is no zero width character in Unicode to join words.
>>>> It is impossible for TR29 to distinguish between U+2019 used as a quotation mark and as an elision.
>>>> There is no other character that is an appropriate replacement for U+2019.
>>>>
>>>> I haven’t yet looked at Unicode TR30 regarding folding rules as it pertains to this.
>>>>
>>>> In Him,
>>>> DM
>>>>
>>>>> On Mar 17, 2025, at 8:46 AM, David Haslam <dfhdfh at protonmail.com> wrote:
>>>>>
>>>>> Dear SWORD developers,
>>>>>
>>>>> I asked about this topic several years ago, and I'm no longer convinced by what we were told back then.
>>>>>
>>>>> After doing further research, it's my understanding that U+2019 RIGHT SINGLE QUOTATION MARK ought not to be hidden by this SWORD filter.
>>>>>
>>>>> -  This codepoint is not a diacritic that modifies the previous Greek letter. In other words, it's not a Greek accent.
>>>>> - This codepoint has the Unicode properties of a punctuation mark.
>>>>> - In Ancient Greek text, it's used to mark an elision, where the final vowel of a word is omitted when the next word begins with a vowel.
>>>>>
>>>>> To view my research, conducted with the help of Grok 3, please visit the following link.
>>>>>
>>>>> - https://grok.com/share/bGVnYWN5_43ff1922-3876-4d9a-9e42-6ae940007fd0
>>>>>
>>>>> I therefore recommend that SWORD developers revisit the specification for this filter, and update it so that U+2019 is never hidden.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> David
>>>>>
>>>>> Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.
>>>>> _______________________________________________
>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>> http://crosswire.org/mailman/listinfo/sword-devel
>>>>> Instructions to unsubscribe/change your settings at above page
>>>
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20250317/57f77984/attachment-0001.htm>


More information about the sword-devel mailing list