<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;">This search result is exactly what I’d expect with the current code as it has been described and shows that the filter is being used.<div><br></div><div>The search request is normalized to remove all the accents and also the U+2019. The text is normalized in the same fashion. So it works with or without U+2019. It might do other normalizations.</div><div><br></div><div>This is with both slow and fast search. BTW, Lucene can do exact phrase search. So I presume that you mean slow search.</div><div><br></div><div>Edit the conf to remove the filter and try it again. I expect the fully accented search to work, but probably not the others.</div><div><br></div><div>If you delete the Lucene index and rebuild it after making the conf change, it might no longer work as expected. I think Lucene has its own normalizers, so it might work. I’m pretty sure it doesn’t have a custom Greek analyzer but uses a Latin-1 analyzer.</div><div><br></div><div>If the code change is made then I expect that U+2019 would be required in the search, in the same fashion that only proper spelling is found, as it’d be treated as a letter.</div><div><br></div><div>The Xiphos display weirdness is a separate issue.</div><div><br></div><div>Let me note, that JSword only has Lucene searches and uses Lucene's Greek analyzer that does noise word (aka stop word) elimination, stemming, case folding, accent elimination, unicode normalization and maybe more. The analyzer is used for both the search request and the building of the index.</div><div><br></div><div>DM</div><div><br><div><div><blockquote type="cite"><div>On Mar 17, 2025, at 3:31 PM, David Haslam <dfhdfh@protonmail.com> wrote:</div><br class="Apple-interchange-newline"><div><div style="font-family: Arial, sans-serif; font-size: 14px;">btw. The same 2 results were obtained by a search for "<span>δι ημερων".</span></div><div style="font-family: Arial, sans-serif; font-size: 14px;"><span><br></span></div><div style="font-family: Arial, sans-serif; font-size: 14px;"><span>i.e. Without the U+2019 in the search key.</span></div><div style="font-family: Arial, sans-serif; font-size: 14px;"><br></div>
<div class="protonmail_signature_block" style="font-family: Arial, sans-serif; font-size: 14px;">
<div class="protonmail_signature_block-user">
Best regards,<br><br>David
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;"><br></div>
<div class="protonmail_signature_block-proton">
Sent with <a target="_blank" href="https://pr.tn/ref/SWXT9A5YZ67G">Proton Mail</a> secure email.
</div>
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div class="protonmail_quote">
On Monday, March 17th, 2025 at 6:46 PM, David Haslam <dfhdfh@protonmail.com> wrote:<br>
<blockquote class="protonmail_quote" type="cite">
<div style="font-family: Arial, sans-serif; font-size: 14px;">Hi DM,</div><div style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div style="font-family: Arial, sans-serif; font-size: 14px;">With <b>Xiphos 4.3.1</b> (latest update) when I searched <b>TischMorph</b> either for "<span style="font-family: system-ui, sans-serif; display: inline !important; background-color: rgb(255, 255, 255);"><span>δι’ ἡμερῶν</span>", or for "<span>δι’ ημερων"</span>, there were 2 results:</span></div><div style="font-family: Arial, sans-serif; font-size: 14px;"><ol style="margin-top: 0px; margin-bottom: 0px;" data-editing-info="{"orderedStyleType":1,"unorderedStyleType":1}"><li style="font-family: system-ui, sans-serif; list-style-type: "1. ";"><span style="font-family: system-ui, sans-serif; display: inline !important; background-color: rgb(255, 255, 255);">Mark 2:1</span></li><li style="font-family: system-ui, sans-serif; list-style-type: "2. ";"><span style="font-family: system-ui, sans-serif; display: inline !important; background-color: rgb(255, 255, 255);">Acts 1:3</span></li></ol></div><div style="font-family: Arial, sans-serif; font-size: 14px;">Search results were no different with the Greek Accents on or off. I therefore conclude that your hunch was incorrect!</div><div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_signature_block"><div class="protonmail_signature_block-user"><br></div><div class="protonmail_signature_block-user"><b><u>Aside:</u></b></div><div class="protonmail_signature_block-user"><ul style="margin-top: 0px; margin-bottom: 0px;" data-editing-info="{"orderedStyleType":1,"unorderedStyleType":1}"><li style="list-style-type: disc;">After an exact phrase search, both results preview correctly.</li><li style="list-style-type: disc;" class="protonmail_signature_block-user">After a Lucene fast search, both results preview really <a title="weirdly" href="https://www.dropbox.com/scl/fi/msw6s8dl4au5z0optwm5l/Screenshot-2025-03-17-18.43.04.png?rlkey=wps1isdrh9h1atdck6r7ihbol&dl=0" target="_blank" rel="noreferrer nofollow noopener">weirdly</a> & <a title="weirdly" href="https://www.dropbox.com/scl/fi/4aiyelopdy1a1gjlpto5f/Screenshot-2025-03-17-18.44.12.png?rlkey=bc1qmql18faoti9b6o6o27qeu&dl=0" target="_blank" rel="noreferrer nofollow noopener">weirdly</a> !!! I think this should be reported to Karl K. Might it be a software bug?</li></ul></div><div class="protonmail_signature_block-user"><br></div><div class="protonmail_signature_block-user">
Best regards,<br><br>David
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;"><br></div>
<div class="protonmail_signature_block-proton">
Sent with <a href="https://pr.tn/ref/SWXT9A5YZ67G" target="_blank" rel="noreferrer nofollow noopener">Proton Mail</a> secure email.
</div>
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div class="protonmail_quote">
On Monday, March 17th, 2025 at 6:17 PM, DM Smith <dmsmith@crosswire.org> wrote:<br>
<blockquote type="cite" class="protonmail_quote">
David,<div>I’m not sure that the filter is only used for display. I think it may also be used for search. In Ancient Greek, we don’t want to have to include U+2019 as part of the search request, but just the letters.</div><div><br></div><div>As a reader of NT Greek, it doesn’t bother me to have <span style="text-wrap-mode: wrap; background-color: rgb(255, 255, 255);">δ αρχαια rather than </span><span style="text-wrap-mode: wrap; background-color: rgb(255, 255, 255);">δ’ αρχαια.</span></div><div><span style="text-wrap-mode: wrap; background-color: rgb(255, 255, 255);"><br></span></div><div><span style="background-color: rgb(255, 255, 255);">BTW, if the filter’s code is changed and if the filter is used for searches, then all indexes of accented NT Greek modules will need to be rebuilt. The user’s search request has to be normalized in exactly the same way as the index was constructed.</span></div><div><br></div><div>DM<br id="lineBreakAtBeginningOfMessage"><div><br><blockquote type="cite"><div>On Mar 17, 2025, at 11:44 AM, David Haslam <dfhdfh@protonmail.com> wrote:</div><br class="Apple-interchange-newline"><div><div style="font-family: Arial, sans-serif; font-size: 14px;">Hi DM,</div><div style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div style="font-family: Arial, sans-serif; font-size: 14px;">One impact is on the <b>StatResGNT</b> module, in which both single and double left/right quotation marks have been added by the project leader.<br>Hiding Greek Accents has the bad effect of losing the end quotation mark for all the level 2 quotations in the text.</div><div style="font-family: Arial, sans-serif; font-size: 14px;">NB. <i>It was seeing this project that prompted me to revisit this topic</i>.</div><div style="font-family: Arial, sans-serif; font-size: 14px;">It would be a real benefit to this module to make the change that I proposed.</div><div style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div style="font-family: Arial, sans-serif; font-size: 14px;">Further to my initial thoughts late last week, I now agree that U+2019 is the right codepoint choice to mark an elision.</div><div style="font-family: Arial, sans-serif; font-size: 14px;">I was somewhat misled by the wrong answer given by <b>Leo AI</b>, which mistakenly told me that it was <span>a way to represent the <b>iota subscript</b>.</span></div><div style="font-family: Arial, sans-serif; font-size: 14px;">It's only since quizzing <b>Grok AI</b> that my thoughts have become clear. <i>I admit that I should've known better, but I'm not a classicist</i>.</div><div style="font-family: Arial, sans-serif; font-size: 14px;">Yet the "category mistake" still exists - since an elision marker is <u>not</u> a diacritic. And by definition, a Greek Accent <u>is</u> a diacritic!</div><div style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div style="font-family: Arial, sans-serif; font-size: 14px;">Making the proposed change to the filter should have a minimal effect upon all the other Ancient Greek Bible modules.<br>The number of words<span style="display: inline !important; background-color: rgb(255, 255, 255);"><span> </span>thus affected</span> in a Greek NT module is not huge! </div><div style="font-family: Arial, sans-serif; font-size: 14px;">There's really no downside to still displaying the "typographical apostrophe".</div><div style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div style="font-family: Arial, sans-serif; font-size: 14px;">To illustrate, these are the only 21 words in <b>TischMorph</b> that end with U+2019.</div><blockquote style="border-left: 3px solid rgb(200, 200, 200); border-top-color: rgb(200, 200, 200); border-right-color: rgb(200, 200, 200); border-bottom-color: rgb(200, 200, 200); padding-left: 10px; color: rgb(102, 102, 102);"><div style="font-family: Arial, sans-serif; font-size: 14px;"><u>Word</u> <u>Count</u><div><span>Δι’ 2</span></div><div><span>Κατ’ 1</span></div><div><span>δ’ 22</span></div><div><span>δι’ 142</span></div><div><span>καθ’ 61</span></div><div><span>κατ’ 82</span></div><div><span>μεθ’ 43</span></div><div><span>μετ’ 132</span></div><div><span>μηδ’ 1</span></div><div><span>οὐδ’ 8</span></div><div><span>παρ’ 59</span></div><div><span>τοῦτ’ 17</span></div><div><span>ἀλλ’ 220</span></div><div><span>ἀνθ’ 5</span></div><div><span>ἀπ’ 119</span></div><div><span>ἀφ’ 44</span></div><div><span>Ἀλλ’ 1</span></div><div><span>ἐπ’ 143</span></div><div><span>ἐφ’ 82</span></div><div><span>ὑπ’ 25</span></div><div><span>ὑφ’ 9</span></div></div></blockquote><div style="font-family: Arial, sans-serif; font-size: 14px;"><br></div>
<div class="protonmail_signature_block" style="font-family: Arial, sans-serif; font-size: 14px;">
<div class="protonmail_signature_block-user">It's now my considered view that even when the Greek accents are hidden by the filter, the elision marks ought to be retained.</div><div class="protonmail_signature_block-user"><br></div><div class="protonmail_signature_block-user">
Best regards,<br><br>David
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;"><br></div>
<div class="protonmail_signature_block-proton">
Sent with <a rel="noreferrer nofollow noopener" target="_blank" href="https://pr.tn/ref/SWXT9A5YZ67G">Proton Mail</a> secure email.
</div>
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div class="protonmail_quote">
On Monday, March 17th, 2025 at 3:06 PM, DM Smith <dmsmith@crosswire.org> wrote:<br>
<blockquote class="protonmail_quote" type="cite">
David, I read your Grok 3 analysis.<div><br></div><div><div>What is the impact of not having this change? What is the impact of making the change? Is it merely presentation of is there an issue with searching too?</div><div><br></div><div>I’ve also been reading <a href="https://corp.unicode.org/pipermail/unicode/2019-January/007563.html" target="_blank" rel="noreferrer nofollow noopener">https://corp.unicode.org/pipermail/unicode/2019-January/007563.html</a> which was referenced in a prior recent thread on U+2019 in Ancient Greek. This is long and worth reading to understand how it might impact SWORD. The thread is initiated by James Tauber.<div><br></div><div>TL;DR:</div><div>U+2019 (and in older texts U+0027) in Ancient Greek was never used for quotations and is only used for elision. It is considered the recommended character for elisions.</div><div>The Unicode rules (when the thread was written in January 2019) of TR29 have that U+2019 is a word break when at the front or end of a word, but not within a word. It is not simply punctuation. These rules are not language aware.</div><div>There is no zero width character in Unicode to join words.</div><div>It is impossible for TR29 to distinguish between U+2019 used as a quotation mark and as an elision.</div><div>There is no other character that is an appropriate replacement for U+2019.</div><div><br></div><div>I haven’t yet looked at Unicode TR30 regarding folding rules as it pertains to this.<br><div><br></div><div>In Him,</div><div><span style="white-space:pre" class="Apple-tab-span"> </span>DM</div><div><br></div><div><div><br><blockquote type="cite"><div>On Mar 17, 2025, at 8:46 AM, David Haslam <dfhdfh@protonmail.com> wrote:</div><br class="Apple-interchange-newline"><div><div style="font-family: Arial, sans-serif; font-size: 14px;">Dear SWORD developers,</div><div style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div style="font-family: Arial, sans-serif; font-size: 14px;">I asked about this topic several years ago, and I'm no longer convinced by what we were told back then.<br>
<br>
After doing further research, it's my understanding that <b>U+2019 RIGHT SINGLE QUOTATION MARK</b> ought <b><u>not</u></b> to be hidden by this SWORD filter.<br>
<br></div><div style="font-family: Arial, sans-serif; font-size: 14px;"><ol style="margin-top: 0px; margin-bottom: 0px;" data-editing-info="{"orderedStyleType":1,"unorderedStyleType":1}"><li style="list-style-type: "1. ";"><span>
This codepoint is <u>not</u> a diacritic that modifies the previous Greek letter. In other words, it's <b><u>not</u></b> a Greek accent.<br></span></li><li style="list-style-type: "2. ";"><span>This codepoint has the Unicode properties of a <b>punctuation mark</b>.</span></li><li style="list-style-type: "3. ";"><span>In Ancient Greek text, it's used to mark an <b>elision</b>, where the final vowel of a word is omitted when the next word begins with a vowel.</span></li></ol></div><div style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div style="font-family: Arial, sans-serif; font-size: 14px;">
To view my research, conducted with the help of <b>Grok 3</b>, please visit the following link.</div><div style="font-family: Arial, sans-serif; font-size: 14px;"><ul style="margin-top: 0px; margin-bottom: 0px;" data-editing-info="{"orderedStyleType":1,"unorderedStyleType":1}"><li style="list-style-type: disc;"><span><a href="https://grok.com/share/bGVnYWN5_43ff1922-3876-4d9a-9e42-6ae940007fd0" rel="noreferrer nofollow noopener" target="_blank">https://grok.com/share/bGVnYWN5_43ff1922-3876-4d9a-9e42-6ae940007fd0</a></span><br></li></ul></div><div style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div style="font-family: Arial, sans-serif; font-size: 14px;">
I therefore recommend that SWORD developers revisit the specification for this filter, and update it so that <b>U+2019</b> is <u>never</u> hidden.<br>
<br>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_signature_block">
<div class="protonmail_signature_block-user">
Best regards,<br><br>David
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;"><br></div>
<div class="protonmail_signature_block-proton">
Sent with <a href="https://pr.tn/ref/SWXT9A5YZ67G" target="_blank" rel="noreferrer nofollow noopener">Proton Mail</a> secure email.
</div>
</div>
</div>_______________________________________________<br>sword-devel mailing list: sword-devel@crosswire.org<br>http://crosswire.org/mailman/listinfo/sword-devel<br>Instructions to unsubscribe/change your settings at above page<br></div></blockquote></div><br></div></div></div></div>
</blockquote><br>
</div>_______________________________________________<br>sword-devel mailing list: sword-devel@crosswire.org<br>http://crosswire.org/mailman/listinfo/sword-devel<br>Instructions to unsubscribe/change your settings at above page<br></div></blockquote></div><br></div>
</blockquote><br>
</div>
</blockquote><br>
</div>_______________________________________________<br>sword-devel mailing list: sword-devel@crosswire.org<br>http://crosswire.org/mailman/listinfo/sword-devel<br>Instructions to unsubscribe/change your settings at above page<br></div></blockquote></div><br></div></div></body></html>