<html><head></head><body> <div dir="auto">In the eXperimental module, all the words are already framed with a w element.</div><div dir="auto"><br></div><div dir="auto">Thereβs been no discussion for how SWORD might render consecutive words differently.</div><div dir="auto"><br></div><div dir="auto">cf. We still donβt have an agreed implementation for Morph Segmentation.</div><div dir="auto"><br></div><div dir="auto">IIRC, David Instone-Brewer once suggested that alternating colours might be a way forward, but AFAICT, such suggestions have always fallen to the ground in the SWORD developers community.</div><div dir="auto"><br></div><div dir="auto">Please expand on what you had in mind, Peter.</div><div dir="auto"><br></div><div dir="auto">Kind regards,</div><div dir="auto"><br></div><div dir="auto">David</div> <div><br></div><div><br></div>On Thu, May 29, 2025 at 17:27, Peter von Kaehne <<a class="" href="mailto:On Thu, May 29, 2025 at 17:27, Peter von Kaehne <<a href=">refdoc@gmx.net</a>> wrote:<blockquote type="cite" class="protonmail_quote"> <div dir="ltr">
I think this has been discussed well.
</div>
<div dir="ltr">
<br>
</div>
<ol data-editing-info="{"applyListStyleFromLevel":false,"orderedStyleType":3}" start="1">
<li style="list-style-type: "1) ";">
<div dir="ltr">
this should be done on a semantic level and not with a kludge and a hack.
</div></li>
<li style="list-style-type: "2) ";">
<div dir="ltr">
the obvious semantic solution is to frame words in w tags and then use CSS/trigger and option/whatever agreed from there.
</div>
<div dir="ltr">
<br>
</div>
<div dir="ltr">
<br>
</div></li>
</ol>
<div dir="ltr" id="ms-outlook-mobile-body-separator-line">
<br>
</div>
<div id="ms-outlook-mobile-signature">
Sent from
<a href="https://aka.ms/o0ukef">Outlook for iOS</a>
</div>
<div class="ms-outlook-mobile-reference-message" id="mail-editor-reference-message-container">
<hr style="display: inline-block; width: 98%;">
<div dir="ltr" id="divRplyFwdMsg">
<span style="font-family: Calibri, sans-serif;"><b>From:</b> sword-devel <sword-devel-bounces@crosswire.org> on behalf of David Haslam <dfhdfh@protonmail.com><br><b>Sent:</b> Thursday, May 29, 2025 3:47 pm<br><b>To:</b> sword-devel mailing list <sword-devel@crosswire.org><br><b>Cc:</b> Modules Issues <modules@crosswire.org>; steve.antioch@gmail.com <steve.antioch@gmail.com><br><b>Subject:</b> [sword-devel] Fw: Repurposing U+2019 RIGHT SINGLE QUOTATION MARK as a Lexical Word Divider for the SE Asian scripts that have NO SPACE BETWEEN WORDS</span>
<div style="font-family: Calibri, sans-serif;">
</div>
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;">
NB. I have cancelled the earlier email because the attachment was too large for
<b>sword-devel</b>.
<br>
<i>It had been in the queue for moderator approval</i>.
<br>
<br>The e
<b>X</b>perimental module
<b>KhmerNTx.zip</b> may now be downloaded from this
<a title="link" href="https://app.box.com/s/e613wf1qdxbjmvux9gbb6vmes33d2rol">link</a> on my
<b>box.net</b> account.
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;" dir="ltr">
<br>
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;">
<i>Please see below for the significant details</i>.
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;" dir="ltr">
<br>
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_signature_block-user">
Best regards,
<br>
<br>David
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_signature_block" dir="ltr">
<br>
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_signature_block-proton">
Sent with
<a href="https://pr.tn/ref/SWXT9A5YZ67G">Proton Mail</a> secure email.
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;" dir="ltr">
<br>
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_quote">
------- Forwarded Message -------
<br> From: David Haslam <dfhdfh@protonmail.com>
<br> Date: On Thursday, May 29th, 2025 at 9:26 AM
<br> Subject: Repurposing U+2019 RIGHT SINGLE QUOTATION MARK as a Lexical Word Divider for the SE Asian scripts that have NO SPACE BETWEEN WORDS
<br> To: sword-devel mailing list <sword-devel@crosswire.org>
<br> CC: steve.antioch@gmail.com <steve.antioch@gmail.com>, Modules Issues <modules@crosswire.org>
<br>
<br>
</div>
<blockquote>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_quote">
Dear SWORD Developers (and our Modules Team),
<br>
<br>While watching the
<a rel="noreferrer nofollow noopener" title="livestream funeral" href="https://www.youtube.com/live/zC4hXOgqBak?si=JZ7JiM7j_fHW-sQl">livestream funeral</a> of OT Scholar the late
<b>Gordon D Wenham</b> yesterday (St Mary's Church, Charlton Kings), I had a bright idea.
<br>
<br>I'd been working recently on potential improvements for the
<b>KhmerNT</b> module relating to marking the
<b>Lexical Word Divisions</b>.
<br>
<b>Khmer</b> is one of the languages of SE Asia whose
<b>Writing System</b> (aka Script) largely has
<b>NO SPACE BETWEEN WORDS</b>.
<br>Others include:
<b>Lao</b>,
<b>Thai</b>,
<b>Myanmar</b> (aka Burmese), together with other languages in the region that employ one of these scripts (e.g.
<b>Isaan</b>).
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_quote">
<br> Until the present, the
<b>KhmerNT</b> module makes use of the ZWSP =
<b>Zero Width Space</b> to mark lexical word boundaries.
<br>This helps with SWORD search for
<b>whole words</b>, because even though the divisions between words are invisible to human eyes, they are accessible to computer software.
<br>
<br>Wouldn't it be nice if ... (cue to sing the melody by the
<b>Beach Boys</b>) πΆ
</div>
<ol style="margin-top: 0px; margin-bottom: 0px;" start="1">
<li style="font-family: Arial, sans-serif; font-size: 14px; list-style-type: "1. ";">We could instead use a <b><i>visible</i></b> Unicode character</li>
<li style="font-family: Arial, sans-serif; font-size: 14px; list-style-type: "2. ";">That character could be <b><i>hidden</i></b> by means of an <b><i>existing</i></b> SWORD filter</li>
</ol>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_quote" dir="ltr">
<br>
</div>
<div style="font-family: Arial, sans-serif; font-size: 13.5pt;" class="protonmail_quote">
<span style="line-height: normal;">There is such a character!!!</span>
</div>
<ul style="margin-top: 0px; margin-bottom: 0px;">
<li style="font-family: Arial, sans-serif; font-size: 14px; list-style-type: disc;"><b>U+2019</b> is one of the codepoints hidden (or changed) by the filter <b>UTF8GreekAccents</b>.</li>
</ul>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_quote" dir="ltr">
<br>
</div>
<blockquote style="padding-left: 10px; border-left-width: 3px; border-left-style: solid; border-left-color: rgb(200, 200, 200);">
<div style="font-family: Arial, sans-serif; font-size: 14px; color: rgb(102, 102, 102);" class="protonmail_quote">
U+2019 (RIGHT SINGLE QUOTATION MARK) is commonly used in digital editions of the
<b>NT Greek</b> as the apostrophe, not as a quotation mark.
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px; color: rgb(102, 102, 102);" class="protonmail_quote" dir="ltr">
<br>
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px; color: rgb(102, 102, 102);" class="protonmail_quote">
In
<b>NT Greek</b>, it appears in:
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px; color: rgb(102, 102, 102);" class="protonmail_quote" dir="ltr">
<br>
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px; color: rgb(102, 102, 102);" class="protonmail_quote">
-
<b>Elisions</b>: When a vowel at the end of a word is dropped (e.g., διβ instead of διά before a vowel).
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px; color: rgb(102, 102, 102);" class="protonmail_quote">
-
<b>Contractions or abbreviations</b>: e.g., αΌΟβ for αΌΟΞ―, ΞΊΞ±ΞΈβ for ΞΊΞ±ΟΞ¬.
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px; color: rgb(102, 102, 102);" class="protonmail_quote" dir="ltr">
<br>
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px; color: rgb(102, 102, 102);" class="protonmail_quote">
While U+2019 is typographically correct for apostrophes in modern typesetting, some older or simpler digital texts may use U+0027 (straight apostrophe). However, U+2019 is the preferred character in high-quality, properly typeset Greek texts.
</div>
</blockquote>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_quote" dir="ltr">
<br>
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_quote">
I then set about to test my idea by making a further update to an already e
<b>X</b>perimental version of the module, provisionally named
<b>KhmerNTx</b>.
<br>
<br>
<span style="font-size: 15pt; line-height: normal;">It "worked like a dream". π</span>
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_quote" dir="ltr">
<br>
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_quote">
With
<b>Greek accents</b>
<i>hidden</i>, the text looks like this:
</div>
<blockquote style="padding-left: 10px; border-left-width: 3px; border-left-style: solid; border-left-color: rgb(200, 200, 200);">
<div style="font-family: Arial, sans-serif; font-size: 13.5pt; color: rgb(102, 102, 102);" class="protonmail_quote">
<span style="line-height: normal;">αααα»ααααααα»α ααΆααΆαααααααααααααααΌαααα·ααα ααΌαα
αααααα½αα’ααααααααααααΆααα
αΆααααΆαααααΎαααΎα α αΎααααααΆαααααααααααααΆαα
ααααΆαααα
ααααααα’αΆαααααα
αααα»αααα»ααα»α αααα»αααΆα‘αΆααΈ αααα»αααΆαααΆααΌααΆ αααα»αα’αΆαααΈ αα·ααααα»ααααΈααΌααΆ (I Peter 1:1 [KhmerNTx])</span>
</div>
</blockquote>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_quote" dir="ltr">
<span style="background-color: rgb(255, 255, 255);"><br></span>
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_quote">
<span style="background-color: rgb(255, 255, 255);">With <b>Greek accents</b> <i>displayed</i>, the text looks like this:</span>
<br>
</div>
<blockquote style="padding-left: 10px; border-left-width: 3px; border-left-style: solid; border-left-color: rgb(200, 200, 200);">
<div style="font-family: Arial, sans-serif; font-size: 13.5pt; color: rgb(102, 102, 102);" class="protonmail_signature_block-user">
<span style="line-height: normal;">αααα»αβαααααα»α ααΆβααΆααβααααβααααβαααααΌβαααα·ααα ααΌαβα
ααααβαα½αα’αααβαααβααααααΆααα
αΆααβααΆαβααααΎαααΎα α αΎαβαααβααΆαβααααααααβααααΆβαα
βααααΆααβαα
βααααααα’αΆααααβαα
βαααα»αβααα»ααα»α αααα»αβααΆα‘αΆααΈ αααα»αβααΆαααΆααΌααΆ αααα»αβα’αΆαααΈ αα·αβαααα»αβαααΈααΌααΆ (I Peter 1:1 [KhmerNTx])</span>
</div>
</blockquote>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_signature_block-user" dir="ltr">
<br>
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_signature_block-user">
I have attached the compressed module for any of you to explore & play with further.
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_signature_block-user" dir="ltr">
<br>
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_signature_block-user">
<u>Aside</u>: The previous update already made use of the OSIS XML
<b>w</b> element to enclose each lexical Khmer word. That remains the case.
<br>In this way, the module source text is ready to be adapted
<span style="background-color: rgb(255, 255, 255);"> for </span>further enhancements such as adding
<b>Strong's</b> numbers, etc, to make a
<b>Study Edition</b>.
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_signature_block-user" dir="ltr">
<br>
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_signature_block-user">
<b>Steve Hyde</b> and the translators in
<b>Cambodia</b> are currently preparing to publish the complete
<b>Khmer Bible</b>.
<br>He has requested my assistance in improving the actual word divisions for the 39 OT books.
<br>I've already been sent the source text, exported from their database.
<br>
<br>Since early May, I have been exploring how the
<b>Grok AI</b> engine can make a positive contribution to the success of this challenging task.
<br>
<i>More on that subject later</i>.
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_signature_block-user" dir="ltr">
<br>
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_signature_block-user">
Best regards,
<br>
<br>David
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_signature_block" dir="ltr">
<br>
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_signature_block-proton">
Sent with
<a rel="noreferrer nofollow noopener" href="https://pr.tn/ref/SWXT9A5YZ67G">Proton Mail</a> secure email.
</div>
</blockquote>
<div style="font-family: Arial, sans-serif; font-size: 14px;" class="protonmail_quote">
<br>
</div>
</div>
<div>
</div>
<p>_______________________________________________<br> sword-devel mailing list: sword-devel@crosswire.org<br> http://crosswire.org/mailman/listinfo/sword-devel<br> Instructions to unsubscribe/change your settings at above page<br> </p></blockquote></body></html>