<html><body><div dir="ltr">I think this has been discussed well. </div><div dir="ltr"><br></div><ol start="1" data-editing-info="{"applyListStyleFromLevel":false,"orderedStyleType":3}"><li style="list-style-type: "1) ";"><div dir="ltr">this should be done on a semantic level and not with a kludge and a hack. </div></li><li style="list-style-type: "2) ";"><div dir="ltr">the obvious semantic solution is to frame words in w tags and then use CSS/trigger and option/whatever agreed from there. </div><div dir="ltr"><br></div><div dir="ltr"><br></div></li></ol><div id="ms-outlook-mobile-body-separator-line" dir="ltr"><br></div><div id="ms-outlook-mobile-signature">Sent from <a href="https://aka.ms/o0ukef">Outlook for iOS</a></div><div id="mail-editor-reference-message-container" class="ms-outlook-mobile-reference-message"><hr style="display: inline-block; width: 98%;"><div id="divRplyFwdMsg" dir="ltr"><span style="font-family: Calibri, sans-serif;"><b>From:</b> sword-devel <sword-devel-bounces@crosswire.org> on behalf of David Haslam <dfhdfh@protonmail.com><br><b>Sent:</b> Thursday, May 29, 2025 3:47 pm<br><b>To:</b> sword-devel mailing list <sword-devel@crosswire.org><br><b>Cc:</b> Modules Issues <modules@crosswire.org>; steve.antioch@gmail.com <steve.antioch@gmail.com><br><b>Subject:</b> [sword-devel] Fw: Repurposing U+2019 RIGHT SINGLE QUOTATION MARK as a Lexical Word Divider for the SE Asian scripts that have NO SPACE BETWEEN WORDS</span><div style="font-family: Calibri, sans-serif;"> </div></div><div style="font-family: Arial, sans-serif; font-size: 14px;">NB. I have cancelled the earlier email because the attachment was too large for <b>sword-devel</b>. <br><i>It had been in the queue for moderator approval</i>.<br><br>The e<b>X</b>perimental module <b>KhmerNTx.zip</b> may now be downloaded from this <a href="https://app.box.com/s/e613wf1qdxbjmvux9gbb6vmes33d2rol" title="link">link</a> on my <b>box.net</b> account.</div><div dir="ltr" style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div style="font-family: Arial, sans-serif; font-size: 14px;"><i>Please see below for the significant details</i>.</div><div dir="ltr" style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div class="protonmail_signature_block-user" style="font-family: Arial, sans-serif; font-size: 14px;">
Best regards,<br><br>David
</div><div dir="ltr" class="protonmail_signature_block" style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div class="protonmail_signature_block-proton" style="font-family: Arial, sans-serif; font-size: 14px;">
Sent with <a href="https://pr.tn/ref/SWXT9A5YZ67G">Proton Mail</a> secure email.
</div><div dir="ltr" style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div class="protonmail_quote" style="font-family: Arial, sans-serif; font-size: 14px;">
------- Forwarded Message -------<br>
From: David Haslam <dfhdfh@protonmail.com><br>
Date: On Thursday, May 29th, 2025 at 9:26 AM<br>
Subject: Repurposing U+2019 RIGHT SINGLE QUOTATION MARK as a Lexical Word Divider for the SE Asian scripts that have NO SPACE BETWEEN WORDS<br>
To: sword-devel mailing list <sword-devel@crosswire.org><br>
CC: steve.antioch@gmail.com <steve.antioch@gmail.com>, Modules Issues <modules@crosswire.org><br><br>
</div><blockquote><div class="protonmail_quote" style="font-family: Arial, sans-serif; font-size: 14px;">Dear SWORD Developers (and our Modules Team),<br><br>While watching the <a href="https://www.youtube.com/live/zC4hXOgqBak?si=JZ7JiM7j_fHW-sQl" title="livestream funeral" rel="noreferrer nofollow noopener">livestream funeral</a> of OT Scholar the late <b>Gordon D Wenham</b> yesterday (St Mary's Church, Charlton Kings), I had a bright idea.<br><br>I'd been working recently on potential improvements for the <b>KhmerNT</b> module relating to marking the <b>Lexical Word Divisions</b>.<br><b>Khmer</b> is one of the languages of SE Asia whose <b>Writing System</b> (aka Script) largely has <b>NO SPACE BETWEEN WORDS</b>.<br>Others include: <b>Lao</b>, <b>Thai</b>, <b>Myanmar</b> (aka Burmese), together with other languages in the region that employ one of these scripts (e.g. <b>Isaan</b>).</div><div class="protonmail_quote" style="font-family: Arial, sans-serif; font-size: 14px;"><br>
Until the present, the <b>KhmerNT</b> module makes use of the ZWSP = <b>Zero Width Space</b> to mark lexical word boundaries.<br>This helps with SWORD search for <b>whole words</b>, because even though the divisions between words are invisible to human eyes, they are accessible to computer software.<br><br>Wouldn't it be nice if ... (cue to sing the melody by the <b>Beach Boys</b>) πΆ</div><ol start="1" style="margin-top: 0px; margin-bottom: 0px;"><li style="font-family: Arial, sans-serif; font-size: 14px; list-style-type: "1. ";">We could instead use a <b><i>visible</i></b> Unicode character</li><li style="font-family: Arial, sans-serif; font-size: 14px; list-style-type: "2. ";">That character could be <b><i>hidden</i></b> by means of an <b><i>existing</i></b> SWORD filter</li></ol><div dir="ltr" class="protonmail_quote" style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div class="protonmail_quote" style="font-family: Arial, sans-serif; font-size: 13.5pt;"><span style="line-height: normal;">There is such a character!!!</span></div><ul style="margin-top: 0px; margin-bottom: 0px;"><li style="font-family: Arial, sans-serif; font-size: 14px; list-style-type: disc;"><b>U+2019</b> is one of the codepoints hidden (or changed) by the filter <b>UTF8GreekAccents</b>.</li></ul><div dir="ltr" class="protonmail_quote" style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><blockquote style="padding-left: 10px; border-left-width: 3px; border-left-style: solid; border-left-color: rgb(200, 200, 200);"><div class="protonmail_quote" style="font-family: Arial, sans-serif; font-size: 14px; color: rgb(102, 102, 102);">U+2019 (RIGHT SINGLE QUOTATION MARK) is commonly used in digital editions of the <b>NT Greek</b> as the apostrophe, not as a quotation mark.</div><div dir="ltr" class="protonmail_quote" style="font-family: Arial, sans-serif; font-size: 14px; color: rgb(102, 102, 102);"><br></div><div class="protonmail_quote" style="font-family: Arial, sans-serif; font-size: 14px; color: rgb(102, 102, 102);">In <b>NT Greek</b>, it appears in:</div><div dir="ltr" class="protonmail_quote" style="font-family: Arial, sans-serif; font-size: 14px; color: rgb(102, 102, 102);"><br></div><div class="protonmail_quote" style="font-family: Arial, sans-serif; font-size: 14px; color: rgb(102, 102, 102);">- <b>Elisions</b>: When a vowel at the end of a word is dropped (e.g., διβ instead of διά before a vowel).</div><div class="protonmail_quote" style="font-family: Arial, sans-serif; font-size: 14px; color: rgb(102, 102, 102);">- <b>Contractions or abbreviations</b>: e.g., αΌΟβ for αΌΟΞ―, ΞΊΞ±ΞΈβ for ΞΊΞ±ΟΞ¬.</div><div dir="ltr" class="protonmail_quote" style="font-family: Arial, sans-serif; font-size: 14px; color: rgb(102, 102, 102);"><br></div><div class="protonmail_quote" style="font-family: Arial, sans-serif; font-size: 14px; color: rgb(102, 102, 102);">While U+2019 is typographically correct for apostrophes in modern typesetting, some older or simpler digital texts may use U+0027 (straight apostrophe). However, U+2019 is the preferred character in high-quality, properly typeset Greek texts.</div></blockquote><div dir="ltr" class="protonmail_quote" style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div class="protonmail_quote" style="font-family: Arial, sans-serif; font-size: 14px;">I then set about to test my idea by making a further update to an already e<b>X</b>perimental version of the module, provisionally named <b>KhmerNTx</b>.<br><br><span style="font-size: 15pt; line-height: normal;">It "worked like a dream". π</span></div><div dir="ltr" class="protonmail_quote" style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div class="protonmail_quote" style="font-family: Arial, sans-serif; font-size: 14px;">With <b>Greek accents</b> <i>hidden</i>, the text looks like this:</div><blockquote style="padding-left: 10px; border-left-width: 3px; border-left-style: solid; border-left-color: rgb(200, 200, 200);"><div class="protonmail_quote" style="font-family: Arial, sans-serif; font-size: 13.5pt; color: rgb(102, 102, 102);"><span style="line-height: normal;">αααα»ααααααα»α ααΆααΆαααααααααααααααΌαααα·ααα ααΌαα
αααααα½αα’ααααααααααααΆααα
αΆααααΆαααααΎαααΎα α αΎααααααΆαααααααααααααΆαα
ααααΆαααα
ααααααα’αΆαααααα
αααα»αααα»ααα»α αααα»αααΆα‘αΆααΈ αααα»αααΆαααΆααΌααΆ αααα»αα’αΆαααΈ αα·ααααα»ααααΈααΌααΆ (I Peter 1:1 [KhmerNTx])</span></div></blockquote><div dir="ltr" class="protonmail_quote" style="font-family: Arial, sans-serif; font-size: 14px;"><span style="background-color: rgb(255, 255, 255);"><br></span></div><div class="protonmail_quote" style="font-family: Arial, sans-serif; font-size: 14px;"><span style="background-color: rgb(255, 255, 255);">With <b>Greek accents</b> <i>displayed</i>, the text looks like this:</span><br>
</div><blockquote style="padding-left: 10px; border-left-width: 3px; border-left-style: solid; border-left-color: rgb(200, 200, 200);"><div class="protonmail_signature_block-user" style="font-family: Arial, sans-serif; font-size: 13.5pt; color: rgb(102, 102, 102);"><span style="line-height: normal;">αααα»αβαααααα»α ααΆβααΆααβααααβααααβαααααΌβαααα·ααα ααΌαβα
ααααβαα½αα’αααβαααβααααααΆααα
αΆααβααΆαβααααΎαααΎα α αΎαβαααβααΆαβααααααααβααααΆβαα
βααααΆααβαα
βααααααα’αΆααααβαα
βαααα»αβααα»ααα»α αααα»αβααΆα‘αΆααΈ αααα»αβααΆαααΆααΌααΆ αααα»αβα’αΆαααΈ αα·αβαααα»αβαααΈααΌααΆ (I Peter 1:1 [KhmerNTx])</span></div></blockquote><div dir="ltr" class="protonmail_signature_block-user" style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div class="protonmail_signature_block-user" style="font-family: Arial, sans-serif; font-size: 14px;">I have attached the compressed module for any of you to explore & play with further.</div><div dir="ltr" class="protonmail_signature_block-user" style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div class="protonmail_signature_block-user" style="font-family: Arial, sans-serif; font-size: 14px;"><u>Aside</u>: The previous update already made use of the OSIS XML <b>w</b> element to enclose each lexical Khmer word. That remains the case.<br>In this way, the module source text is ready to be adapted<span style="background-color: rgb(255, 255, 255);"> for </span>further enhancements such as adding <b>Strong's</b> numbers, etc, to make a <b>Study Edition</b>.</div><div dir="ltr" class="protonmail_signature_block-user" style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div class="protonmail_signature_block-user" style="font-family: Arial, sans-serif; font-size: 14px;"><b>Steve Hyde</b> and the translators in <b>Cambodia</b> are currently preparing to publish the complete <b>Khmer Bible</b>.<br>He has requested my assistance in improving the actual word divisions for the 39 OT books.<br>I've already been sent the source text, exported from their database.<br><br>Since early May, I have been exploring how the <b>Grok AI</b> engine can make a positive contribution to the success of this challenging task.<br><i>More on that subject later</i>.</div><div dir="ltr" class="protonmail_signature_block-user" style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div class="protonmail_signature_block-user" style="font-family: Arial, sans-serif; font-size: 14px;">
Best regards,<br><br>David
</div><div dir="ltr" class="protonmail_signature_block" style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div class="protonmail_signature_block-proton" style="font-family: Arial, sans-serif; font-size: 14px;">
Sent with <a href="https://pr.tn/ref/SWXT9A5YZ67G" rel="noreferrer nofollow noopener">Proton Mail</a> secure email.
</div></blockquote><div class="protonmail_quote" style="font-family: Arial, sans-serif; font-size: 14px;"><br>
</div></div><div> </div></body></html>