<div style="font-family: Arial, sans-serif; font-size: 14px;">NB. I have cancelled the earlier email because the attachment was too large for <b>sword-devel</b>. <br><i>It had been in the queue for moderator approval</i>.<br><br>The e<b>X</b>perimental module <b>KhmerNTx.zip</b> may now be downloaded from this <a href="https://app.box.com/s/e613wf1qdxbjmvux9gbb6vmes33d2rol" title="link">link</a> on my <b>box.net</b> account.</div><div style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div style="font-family: Arial, sans-serif; font-size: 14px;"><i>Please see below for the significant details</i>.</div><div style="font-family: Arial, sans-serif; font-size: 14px;"><br></div>
<div class="protonmail_signature_block" style="font-family: Arial, sans-serif; font-size: 14px;">
<div class="protonmail_signature_block-user">
Best regards,<br><br>David
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;"><br></div>
<div class="protonmail_signature_block-proton">
Sent with <a target="_blank" href="https://pr.tn/ref/SWXT9A5YZ67G">Proton Mail</a> secure email.
</div>
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;"><br><div class="protonmail_quote">
------- Forwarded Message -------<br>
From: David Haslam <dfhdfh@protonmail.com><br>
Date: On Thursday, May 29th, 2025 at 9:26 AM<br>
Subject: Repurposing U+2019 RIGHT SINGLE QUOTATION MARK as a Lexical Word Divider for the SE Asian scripts that have NO SPACE BETWEEN WORDS<br>
To: sword-devel mailing list <sword-devel@crosswire.org><br>
CC: steve.antioch@gmail.com <steve.antioch@gmail.com>, Modules Issues <modules@crosswire.org><br><br>
<blockquote class="protonmail_quote" type="cite">
<div style="font-family: Arial, sans-serif; font-size: 14px;">Dear SWORD Developers (and our Modules Team),<br><br>While watching the <a title="livestream funeral" href="https://www.youtube.com/live/zC4hXOgqBak?si=JZ7JiM7j_fHW-sQl" target="_blank" rel="noreferrer nofollow noopener">livestream funeral</a> of OT Scholar the late <b>Gordon D Wenham</b> yesterday (St Mary's Church, Charlton Kings), I had a bright idea.<br><br>I'd been working recently on potential improvements for the <b>KhmerNT</b> module relating to marking the <b>Lexical Word Divisions</b>.<br><b>Khmer</b> is one of the languages of SE Asia whose <b>Writing System</b> (aka Script) largely has <span><b>NO SPACE BETWEEN WORDS</b>.<br>Others include: <b>Lao</b>, <b>Thai</b>, <b>Myanmar</b> (aka Burmese), together with other languages in the region that employ one of these scripts (e.g. <b>Isaan</b>).</span></div><div style="font-family: Arial, sans-serif; font-size: 14px;"><br>
Until the present, the <b>KhmerNT</b> module makes use of the ZWSP = <b>Zero Width Space</b> to mark lexical word boundaries.<br>This helps with SWORD search for <b>whole words</b>, because even though the divisions between words are invisible to human eyes, they are accessible to computer software.<br><br>Wouldn't it be nice if ... (cue to sing the melody by the <b>Beach Boys</b>) πΆ<br></div><div style="font-family: Arial, sans-serif; font-size: 14px;"><ol data-listchain="__List_Chain_87" style="margin-top: 0px; margin-bottom: 0px;" data-editing-info="{"orderedStyleType":1,"unorderedStyleType":1}"><li style="list-style-type: "1. ";">We could instead use a <i><b>visible</b></i> Unicode character</li><li style="list-style-type: "2. ";">That character could be <b><i>hidden</i></b> by means of an <i><b>existing</b></i> SWORD filter</li></ol></div><div style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div style="font-family: Arial, sans-serif; font-size: 14px;"><span style="font-size: 13.5pt; line-height: normal;">There is such a character!!!</span></div><div style="font-family: Arial, sans-serif; font-size: 14px;"><ul style="margin-top: 0px; margin-bottom: 0px;" data-editing-info="{"orderedStyleType":1,"unorderedStyleType":1}"><li style="list-style-type: disc;"><span><b>U+2019</b> <span>is one of the codepoints hidden (or changed) by the filter <b>UTF8GreekAccents</b>.</span></span></li></ul></div><div style="font-family: Arial, sans-serif; font-size: 14px;"><span><br></span></div><blockquote style="border-left: 3px solid rgb(200, 200, 200); border-top-color: rgb(200, 200, 200); border-right-color: rgb(200, 200, 200); border-bottom-color: rgb(200, 200, 200); padding-left: 10px; color: rgb(102, 102, 102);"><div style="font-family: Arial, sans-serif; font-size: 14px;"><span><span>U+2019 (RIGHT SINGLE QUOTATION MARK) is commonly used in digital editions of the <b>NT Greek</b> as the apostrophe, not as a quotation mark.</span><div><br></div><div><span>In <b>NT Greek</b>, it appears in:</span></div><div><br></div><div><span>- <b>Elisions</b>: When a vowel at the end of a word is dropped (e.g., διβ instead of διά before a vowel).</span></div><div><span>- <b>Contractions or abbreviations</b>: e.g., αΌΟβ for αΌΟΞ―, ΞΊΞ±ΞΈβ for ΞΊΞ±ΟΞ¬.</span></div><div><br></div><span>While U+2019 is typographically correct for apostrophes in modern typesetting, some older or simpler digital texts may use U+0027 (straight apostrophe). However, U+2019 is the preferred character in high-quality, properly typeset Greek texts.</span><br></span></div></blockquote><div style="font-family: Arial, sans-serif; font-size: 14px;"><span><br></span></div><div style="font-family: Arial, sans-serif; font-size: 14px;">I then set about to test my idea by making a further update to an already e<b>X</b>perimental version of the module, provisionally named <b>KhmerNTx</b>.<br><br><span style="font-size: 15pt; line-height: normal;">It "worked like a dream". </span><span style="font-size: 15pt; line-height: normal;">π</span></div><div style="font-family: Arial, sans-serif; font-size: 14px;"><br></div><div style="font-family: Arial, sans-serif; font-size: 14px;">With <b>Greek accents</b> <i>hidden</i>, the text looks like this:</div><blockquote style="border-left: 3px solid rgb(200, 200, 200); border-top-color: rgb(200, 200, 200); border-right-color: rgb(200, 200, 200); border-bottom-color: rgb(200, 200, 200); padding-left: 10px; color: rgb(102, 102, 102);"><div style="font-family: Arial, sans-serif; font-size: 14px;"><span style="font-size: 13.5pt; line-height: normal;">αααα»ααααααα»α ααΆααΆαααααααααααααααΌαααα·ααα ααΌαα
αααααα½αα’ααααααααααααΆααα
αΆααααΆαααααΎαααΎα α αΎααααααΆαααααααααααααΆαα
ααααΆαααα
ααααααα’αΆαααααα
αααα»αααα»ααα»α αααα»αααΆα‘αΆααΈ αααα»αααΆαααΆααΌααΆ αααα»αα’αΆαααΈ αα·ααααα»ααααΈααΌααΆ (I Peter 1:1 [KhmerNTx])</span><br></div></blockquote><div style="font-family: Arial, sans-serif; font-size: 14px;"><span style="display: inline !important; background-color: rgb(255, 255, 255);"><br></span></div><div style="font-family: Arial, sans-serif; font-size: 14px;">
<span style="display: inline !important; background-color: rgb(255, 255, 255);">With <b>Greek accents</b> <i>displayed</i>, the text looks like this:</span><br>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_signature_block">
<blockquote style="border-left: 3px solid rgb(200, 200, 200); border-top-color: rgb(200, 200, 200); border-right-color: rgb(200, 200, 200); border-bottom-color: rgb(200, 200, 200); padding-left: 10px; color: rgb(102, 102, 102);"><div class="protonmail_signature_block-user"><span style="font-size: 13.5pt; line-height: normal;">αααα»αβαααααα»α ααΆβααΆααβααααβααααβαααααΌβαααα·ααα ααΌαβα
ααααβαα½αα’αααβαααβααααααΆααα
αΆααβααΆαβααααΎαααΎα α αΎαβαααβααΆαβααααααααβααααΆβαα
βααααΆααβαα
βααααααα’αΆααααβαα
βαααα»αβααα»ααα»α αααα»αβααΆα‘αΆααΈ αααα»αβααΆαααΆααΌααΆ αααα»αβα’αΆαααΈ αα·αβαααα»αβαααΈααΌααΆ (I Peter 1:1 [KhmerNTx])</span><br></div></blockquote><div class="protonmail_signature_block-user"><br></div><div class="protonmail_signature_block-user">I have attached the compressed module for any of you to explore & play with further.</div><div class="protonmail_signature_block-user"><br></div><div class="protonmail_signature_block-user"><u>Aside</u>: The previous update already made use of the OSIS XML <b>w</b> element to enclose each lexical Khmer word. That remains the case.<br>In this way, the module source text is ready to be adapted<span style="display: inline !important; background-color: rgb(255, 255, 255);"><span> </span>for<span> </span></span>further enhancements such as adding <b>Strong's</b> numbers, etc, to make a <b>Study Edition</b>.</div><div class="protonmail_signature_block-user"><br></div><div class="protonmail_signature_block-user"><b>Steve Hyde</b> and the translators in <b>Cambodia</b> are currently preparing to publish the complete <b>Khmer Bible</b>.<br>He has requested my assistance in improving the actual word divisions for the 39 OT books.<br>I've already been sent the source text, exported from their database.<br><br>Since early May, I have been exploring how the <b>Grok AI</b> engine can make a positive contribution to the success of this challenging task.<br><i>More on that subject later</i>.</div><div class="protonmail_signature_block-user"><br></div><div class="protonmail_signature_block-user">
Best regards,<br><br>David
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;"><br></div>
<div class="protonmail_signature_block-proton">
Sent with <a href="https://pr.tn/ref/SWXT9A5YZ67G" target="_blank" rel="noreferrer nofollow noopener">Proton Mail</a> secure email.
</div>
</div>
</div>
</blockquote><br>
</div></div>