[sword-devel] Fw: Repurposing U+2019 RIGHT SINGLE QUOTATION MARK as a Lexical Word Divider for the SE Asian scripts that have NO SPACE BETWEEN WORDS
David Haslam
dfhdfh at protonmail.com
Thu May 29 12:36:32 EDT 2025
In the eXperimental module, all the words are already framed with a w element.
There’s been no discussion for how SWORD might render consecutive words differently.
cf. We still don’t have an agreed implementation for Morph Segmentation.
IIRC, David Instone-Brewer once suggested that alternating colours might be a way forward, but AFAICT, such suggestions have always fallen to the ground in the SWORD developers community.
Please expand on what you had in mind, Peter.
Kind regards,
David
On Thu, May 29, 2025 at 17:27, Peter von Kaehne <[refdoc at gmx.net](mailto:On Thu, May 29, 2025 at 17:27, Peter von Kaehne <<a href=)> wrote:
> I think this has been discussed well.
>
> - this should be done on a semantic level and not with a kludge and a hack.
> - the obvious semantic solution is to frame words in w tags and then use CSS/trigger and option/whatever agreed from there.
>
> Sent from [Outlook for iOS](https://aka.ms/o0ukef)
> ---------------------------------------------------------------
>
> From: sword-devel <sword-devel-bounces at crosswire.org> on behalf of David Haslam <dfhdfh at protonmail.com>
> Sent: Thursday, May 29, 2025 3:47 pm
> To: sword-devel mailing list <sword-devel at crosswire.org>
> Cc: Modules Issues <modules at crosswire.org>; steve.antioch at gmail.com <steve.antioch at gmail.com>
> Subject: [sword-devel] Fw: Repurposing U+2019 RIGHT SINGLE QUOTATION MARK as a Lexical Word Divider for the SE Asian scripts that have NO SPACE BETWEEN WORDS
>
> NB. I have cancelled the earlier email because the attachment was too large for sword-devel.
> It had been in the queue for moderator approval.
>
> The e Xperimental module KhmerNTx.zip may now be downloaded from this [link](https://app.box.com/s/e613wf1qdxbjmvux9gbb6vmes33d2rol) on my box.net account.
>
> Please see below for the significant details.
>
> Best regards,
>
> David
>
> Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.
>
> ------- Forwarded Message -------
> From: David Haslam <dfhdfh at protonmail.com>
> Date: On Thursday, May 29th, 2025 at 9:26 AM
> Subject: Repurposing U+2019 RIGHT SINGLE QUOTATION MARK as a Lexical Word Divider for the SE Asian scripts that have NO SPACE BETWEEN WORDS
> To: sword-devel mailing list <sword-devel at crosswire.org>
> CC: steve.antioch at gmail.com <steve.antioch at gmail.com>, Modules Issues <modules at crosswire.org>
>
>> Dear SWORD Developers (and our Modules Team),
>>
>> While watching the [livestream funeral](https://www.youtube.com/live/zC4hXOgqBak?si=JZ7JiM7j_fHW-sQl) of OT Scholar the late Gordon D Wenham yesterday (St Mary's Church, Charlton Kings), I had a bright idea.
>>
>> I'd been working recently on potential improvements for the KhmerNT module relating to marking the Lexical Word Divisions.
>> Khmer is one of the languages of SE Asia whose Writing System (aka Script) largely has NO SPACE BETWEEN WORDS.
>> Others include: Lao, Thai, Myanmar (aka Burmese), together with other languages in the region that employ one of these scripts (e.g. Isaan).
>>
>> Until the present, the KhmerNT module makes use of the ZWSP = Zero Width Space to mark lexical word boundaries.
>> This helps with SWORD search for whole words, because even though the divisions between words are invisible to human eyes, they are accessible to computer software.
>>
>> Wouldn't it be nice if ... (cue to sing the melody by the Beach Boys) 🎶
>>
>> - We could instead use a visible Unicode character
>> - That character could be hidden by means of an existing SWORD filter
>>
>> There is such a character!!!
>>
>> - U+2019 is one of the codepoints hidden (or changed) by the filter UTF8GreekAccents.
>>
>>> U+2019 (RIGHT SINGLE QUOTATION MARK) is commonly used in digital editions of the NT Greek as the apostrophe, not as a quotation mark.
>>>
>>> In NT Greek, it appears in:
>>>
>>> - Elisions: When a vowel at the end of a word is dropped (e.g., δι’ instead of διά before a vowel).
>>> - Contractions or abbreviations: e.g., ἐπ’ for ἐπί, καθ’ for κατά.
>>>
>>> While U+2019 is typographically correct for apostrophes in modern typesetting, some older or simpler digital texts may use U+0027 (straight apostrophe). However, U+2019 is the preferred character in high-quality, properly typeset Greek texts.
>>
>> I then set about to test my idea by making a further update to an already e Xperimental version of the module, provisionally named KhmerNTx.
>>
>> It "worked like a dream". 😎
>>
>> With Greek accents hidden, the text looks like this:
>>
>>> ខ្ញុំពេត្រុស ជាសាវករបស់ព្រះយេស៊ូគ្រិស្ដ ជូនចំពោះពួកអ្នកដែលព្រះជាម្ចាស់បានជ្រើសរើស ហើយដែលបានបែកខ្ញែកគ្នាទៅស្នាក់នៅបណ្ដោះអាសន្ននៅស្រុកប៉ុនតុស ស្រុកកាឡាទី ស្រុកកាប៉ាដូគា ស្រុកអាស៊ី និងស្រុកប៉ីធូនា (I Peter 1:1 [KhmerNTx])
>>
>> With Greek accents displayed, the text looks like this:
>>
>>> ខ្ញុំ’ពេត្រុស ជា’សាវក’របស់’ព្រះ’យេស៊ូ’គ្រិស្ដ ជូន’ចំពោះ’ពួកអ្នក’ដែល’ព្រះជាម្ចាស់’បាន’ជ្រើសរើស ហើយ’ដែល’បាន’បែកខ្ញែក’គ្នា’ទៅ’ស្នាក់’នៅ’បណ្ដោះអាសន្ន’នៅ’ស្រុក’ប៉ុនតុស ស្រុក’កាឡាទី ស្រុក’កាប៉ាដូគា ស្រុក’អាស៊ី និង’ស្រុក’ប៉ីធូនា (I Peter 1:1 [KhmerNTx])
>>
>> I have attached the compressed module for any of you to explore & play with further.
>>
>> Aside: The previous update already made use of the OSIS XML w element to enclose each lexical Khmer word. That remains the case.
>> In this way, the module source text is ready to be adapted for further enhancements such as adding Strong's numbers, etc, to make a Study Edition.
>>
>> Steve Hyde and the translators in Cambodia are currently preparing to publish the complete Khmer Bible.
>> He has requested my assistance in improving the actual word divisions for the 39 OT books.
>> I've already been sent the source text, exported from their database.
>>
>> Since early May, I have been exploring how the Grok AI engine can make a positive contribution to the success of this challenging task.
>> More on that subject later.
>>
>> Best regards,
>>
>> David
>>
>> Sent with [Proton Mail](https://pr.tn/ref/SWXT9A5YZ67G) secure email.
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20250529/01b4e869/attachment-0001.htm>
More information about the sword-devel
mailing list