<div dir="ltr"><div class="gmail_default" style="font-family:garamond,serif;font-size:large">I've been involved with translation projects (Thai) that let the translators do their thing, with the idea to introduce encoding afterward. The tendency with these languages is if you leave it to the translator, there will be no mark between words, justlettersalltogetherinsequence, which is normal in their world. However, Bible Markup has verse units and word units, and to get to word unit, you need SOMETHING. <br><br>What we used to introduce them after the fact was called (kukut, quecut?) which introduced a hair space (because the typesetting program or paratext abused other glyphs, but this was 2014) where a dictionary/grammar algorhythm suggested there should be a word break. <br><br>Bottom line is that a human saying where the words break is FAR better than a computer. You'll accelerate the completion of a project like this by addressing it as early as possible and make the words break in the early stages for proper markup, than by leaving it to a computer with a dictionary (and we're talking about minority languages where that dictionary is not even to an alpha level.) <br><br>However, when it came time to render the files into readable text on screen and paper, we ultimately reverted to tagging them with word tags and removing the fake zero width space, because regardless of the unicode point we used, it turned into an actual character in the text stream, that doubled the imposition stretch or squeeze that publishing programs do to make lines justified, visually introducing assymetry into the text and causing complaints. More specifically, on lines that were tight (more letters than average), the word breaks were smaller than normal, and people complained. But when the lines were loose (fewer letters than average), the words had visible space between them, but I don't recall anyone complaining. The complaints were universally about being too close. <br><br>So, from experience, using word tags is a lot more resilient across all methods of using the text later, whether it's a sword module, or paper, or epub or on a projector screen in a church. <br><br>And kukut also introduced unicode points that described where words could break with a hyphen. That got translated into a hyphenation database similar to modern hunspell. By the time we finished the files, they had no embedded hyphenation points for the same reason (the points would get stretched and squeezed causing reader confusion.) </div></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Thu, May 1, 2025 at 8:22 AM Peter von Kaehne <<a href="mailto:refdoc@gmx.net">refdoc@gmx.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div dir="ltr">David, that is misreading what I said.</div><div dir="ltr"><br></div><div dir="ltr">If we want to create a new feature then it is a module makers responsibility to create the markup. </div><div dir="ltr"><br></div><div dir="ltr">The markup which lends itself to toggling display is proper xml markup </div><div dir="ltr"><br></div><div dir="ltr">If there are modules which use ZWNJ or else currently then this is fine and good but in that form they can not and should not get the benefit of such a feature. They would require updating. </div><div id="m_-4657640466407500877ms-outlook-mobile-body-separator-line" dir="ltr"><br></div><div id="m_-4657640466407500877ms-outlook-mobile-signature">Sent from <a href="https://aka.ms/o0ukef" target="_blank">Outlook for iOS</a></div><div id="m_-4657640466407500877mail-editor-reference-message-container"><hr style="display:inline-block;width:98%"><div id="m_-4657640466407500877divRplyFwdMsg" dir="ltr"><span style="font-family:Calibri,sans-serif"><b>From:</b> David F. Haslam <<a href="mailto:df.haslam@btinternet.com" target="_blank">df.haslam@btinternet.com</a>><br><b>Sent:</b> Thursday, May 1, 2025 1:40 pm<br><b>To:</b> Peter von Kaehne <<a href="mailto:refdoc@gmx.net" target="_blank">refdoc@gmx.net</a>>; SWORD Developers' Collaboration Forum <<a href="mailto:sword-devel@crosswire.org" target="_blank">sword-devel@crosswire.org</a>><br><b>Subject:</b> Re: [sword-devel] Proposal for a new SWORD filter to display word dividers</span><div style="font-family:Calibri,sans-serif"> </div></div>
But we can help them towards that goal by making module development
less onerous.<br>
Then perhaps they might use our derived module to help them check
their translation at each stage<br>
without them having to keep asking us to rebuild the module for them
with all the demanding file format transformations such a task
entails.<br>
There's nothing that forbids us to accept a module containing ZWSP
characters <i>per se</i>.<br>
<br>
And, btw, existing CrossWire module <b>KhmerNT</b> contains 223,198
ZWSP.<br>
Since it was released on 2012-02-15 nobody has batted an eyelid that
it used this means to mark lexical word boundaries.<br>
Not you, not me, not anyone in the core development team.<br>
So it's a bit rich to say over 13 years later that "it is our job to
.... and apply it".<br>
<br>
Our ministry as a Society should include actively assisting
translators, not merely distributing their finished product.<br>
<br>
Aside: We've not heard from any other team members yet.<br>
<br>
David<br>
<br>
<div>On 2025-05-01 13:07, Peter von Kaehne
wrote:<br>
</div><blockquote><div dir="ltr">I would not expect any Bible translator to do
anything.</div><div dir="ltr"><br>
</div><div dir="ltr">if they tell us they used whatever to mark up
whatever then it is our job as module team tk take whatever and
find the appropriate semantic mark up and apply it. </div><div dir="ltr"><br>
</div><div dir="ltr">This is not different. </div><div dir="ltr"><br>
</div><div dir="ltr">Peter</div><div id="m_-4657640466407500877ms-outlook-mobile-body-separator-line" dir="ltr"><br>
</div><div id="m_-4657640466407500877ms-outlook-mobile-signature">Sent from <a href="https://aka.ms/o0ukef" target="_blank">Outlook
for iOS</a></div><div id="m_-4657640466407500877mail-editor-reference-message-container"><hr style="display:inline-block;width:98%"><div id="m_-4657640466407500877divRplyFwdMsg" dir="ltr"><span style="font-family:Calibri,sans-serif"><b>From:</b> sword-devel
<a href="mailto:sword-devel-bounces@crosswire.org" target="_blank"><sword-devel-bounces@crosswire.org></a> on behalf of David
Haslam <a href="mailto:dfhdfh@protonmail.com" target="_blank"><dfhdfh@protonmail.com></a><br>
<b>Sent:</b> Thursday, May 1, 2025 12:59 pm<br>
<b>To:</b> SWORD Developers' Collaboration Forum
<a href="mailto:sword-devel@crosswire.org" target="_blank"><sword-devel@crosswire.org></a><br>
<b>Cc:</b> David Haslam <a href="mailto:df.haslam@btinternet.com" target="_blank"><df.haslam@btinternet.com></a><br>
<b>Subject:</b> Re: [sword-devel] Proposal for a new SWORD
filter to display word dividers</span>
<div style="font-family:Calibri,sans-serif"> </div></div><div style="font-family:Arial,sans-serif;font-size:14px">Hi
Peter,</div><div dir="ltr" style="font-family:Arial,sans-serif;font-size:14px"><br>
</div><div style="font-family:Arial,sans-serif;font-size:14px">Undoubtedly,
but we cannot demand or expect most Bible translators to be
XML afficionados.</div><div dir="ltr" style="font-family:Arial,sans-serif;font-size:14px"><br>
</div><div style="font-family:Arial,sans-serif;font-size:14px">It's
even difficult to teach some members of a translation team to
use the ZWSP properly.<br>
<br>
"If you cannot see it, key it again" can so easily become the
<i>modus operandi</i>.</div><div style="font-family:Arial,sans-serif;font-size:14px">Witness
the following in the same chapter prior to my involvement.</div><div style="font-family:Arial,sans-serif;font-size:14px">After
I replaced all ZWSP by MIDDLE DOT, just look at the tangle!!!<br>
<i>See attached text file</i>.<br>
<br>
So we should do "belt and braces" to help the weak. </div><div style="font-family:Arial,sans-serif;font-size:14px">Also
called "going the extra mile". 😎</div><div dir="ltr" style="font-family:Arial,sans-serif;font-size:14px"><br>
</div><div style="font-family:Arial,sans-serif;font-size:14px">But
worry not. My feedback is already helping the Khmer
translation team. </div><div dir="ltr" style="font-family:Arial,sans-serif;font-size:14px"><br>
</div><div style="font-family:Arial,sans-serif;font-size:14px">Best
regards,<br>
<br>
David
</div><div dir="ltr" style="font-family:Arial,sans-serif;font-size:14px"><br>
</div><div style="font-family:Arial,sans-serif;font-size:14px">Sent
with <a href="https://pr.tn/ref/SWXT9A5YZ67G" target="_blank">Proton Mail</a> secure email.
</div><div dir="ltr" style="font-family:Arial,sans-serif;font-size:14px"><br>
</div><div>On Thursday, May 1st, 2025 at
12:47 PM, Peter von Kaehne <a href="mailto:refdoc@gmx.net" target="_blank"><refdoc@gmx.net></a> wrote:<br>
</div><blockquote><div dir="ltr">I think this is not
difficult per se, but it should be properly encoded. </div><div dir="ltr"><br>
</div><div dir="ltr"><w> seems
correct, using zero with characters seems not correct. </div><div dir="ltr"><br>
</div><div dir="ltr">Peter</div><div dir="ltr" id="m_-4657640466407500877ms-outlook-mobile-body-separator-line"><br>
</div><div id="m_-4657640466407500877ms-outlook-mobile-signature">Sent from <a href="https://aka.ms/o0ukef" rel="noreferrer nofollow noopener" target="_blank">Outlook
for iOS</a></div><div id="m_-4657640466407500877mail-editor-reference-message-container"><hr style="display:inline-block;width:98%"><div dir="ltr" id="m_-4657640466407500877divRplyFwdMsg"><span style="font-family:Calibri,sans-serif"><b>From:</b> sword-devel
<a href="mailto:sword-devel-bounces@crosswire.org" target="_blank"><sword-devel-bounces@crosswire.org></a> on behalf of
David Haslam <a href="mailto:dfhdfh@protonmail.com" target="_blank"><dfhdfh@protonmail.com></a><br>
<b>Sent:</b> Thursday, May 1, 2025 11:30 am<br>
<b>To:</b> sword-devel mailing list
<a href="mailto:sword-devel@crosswire.org" target="_blank"><sword-devel@crosswire.org></a><br>
<b>Cc:</b> David Haslam <a href="mailto:df.haslam@btinternet.com" target="_blank"><df.haslam@btinternet.com></a><br>
<b>Subject:</b> [sword-devel] Proposal for a new SWORD
filter to display word dividers</span>
<div style="font-family:Calibri,sans-serif"> </div></div><div style="font-family:Arial,sans-serif;font-size:14px">I
wish to propose that we design in a new SWORD filter.<br>
<br>
The conf key would be:</div><ul style="margin-top:0px;margin-bottom:0px"><li style="font-family:Arial,sans-serif;font-size:14px;list-style-type:disc"><b>GlobalOptionFilter=ShowWordDividers</b></li></ul><div dir="ltr" style="font-family:Arial,sans-serif;font-size:14px"><br>
</div><div style="font-family:Arial,sans-serif;font-size:14px">In
the writing systems for the various languages of SE Asia (<b>Thai</b>,
<b>Khmer</b>, <b>Lao</b>, <b>Myanmar</b>) there is
[generally] <b>no space between words</b>.</div><div dir="ltr" style="font-family:Arial,sans-serif;font-size:14px"><br>
</div><div style="font-family:Arial,sans-serif;font-size:14px">In
this respect, they are like many European languages before
the start of <a href="https://www.amazon.com/Space-Between-Words-Origins-Medieval/dp/080474016X" title="silent reading" rel="noreferrer nofollow noopener" target="_blank">silent reading</a>. The
descriptive term is <b><i>Scriptura Continua</i></b>.</div><div style="font-family:Arial,sans-serif;font-size:14px"><br>
Some Bible translations for this region are already making
use of one of the ZERO WIDTH characters to invisibly mark
the divisions between lexical words.</div><div style="font-family:Arial,sans-serif;font-size:14px">Options
include:</div><ul style="margin-top:0px;margin-bottom:0px"><li style="font-family:Arial,sans-serif;font-size:14px;list-style-type:disc">U+200B
ZERO WIDTH SPACE</li><li style="font-family:Arial,sans-serif;font-size:14px;list-style-type:disc">U+200C
ZERO WIDTH NON-JOINER</li><li style="font-family:Arial,sans-serif;font-size:14px;list-style-type:disc">U+FEFF
ZERO WIDTH NO BREAK SPACE</li></ul><div style="font-family:Arial,sans-serif;font-size:14px">They
exclude:</div><ul style="margin-top:0px;margin-bottom:0px"><li style="font-family:Arial,sans-serif;font-size:14px;list-style-type:disc">U+200D
ZERO WIDTH JOINER</li></ul><div style="font-family:Arial,sans-serif;font-size:14px">A
further possibility, even without requiring a full study
Bible with Strong's, etc, is to simply wrap each lexical
word within the OSIS <b>w</b> element.</div><div style="font-family:Arial,sans-serif;font-size:14px">One
without any OSIS attributes would suffice for this
purpose. Likewise, for the <b>seg</b> element.</div><div dir="ltr" style="font-family:Arial,sans-serif;font-size:14px"><br>
</div><div style="font-family:Arial,sans-serif;font-size:14px">My
proposal is that we design a feature to <b>show/hide word
dividers</b> by displaying them using a suitable visible
but non-intrusive character.</div><div style="font-family:Arial,sans-serif;font-size:14px">My
suggestion is to use this Unicode character by default:</div><div dir="ltr" style="font-family:Arial,sans-serif;font-size:14px"><br>
</div><ul style="margin-top:0px;margin-bottom:0px"><li style="font-family:Arial,sans-serif;font-size:14px;list-style-type:disc">U+00B7
MIDDLE DOT</li></ul><div dir="ltr" style="font-family:Arial,sans-serif;font-size:14px"><br>
</div><div style="font-family:Arial,sans-serif;font-size:14px">We
could even allow the actual visible character to be
specified in a second conf key, thus:</div><div dir="ltr" style="font-family:Arial,sans-serif;font-size:14px"><br>
</div><ul style="margin-top:0px;margin-bottom:0px"><li style="font-family:Arial,sans-serif;font-size:14px;list-style-type:disc">VisibleWordDivider=U+00B7</li></ul><div dir="ltr" style="font-family:Arial,sans-serif;font-size:14px"><br>
</div><div style="font-family:Arial,sans-serif;font-size:14px">Benefits
would include:</div><ul style="margin-top:0px;margin-bottom:0px"><li style="font-family:Arial,sans-serif;font-size:14px">Helps
with language learning to know where lexical words start
and end</li><li style="font-family:Arial,sans-serif;font-size:14px">Helps
with front-end search for whole words, exact phrase or
all words</li><li style="font-family:Arial,sans-serif;font-size:14px">Helps
with checking the accuracy of Bible translations by
clearly displaying lexical word boundaries at the touch
of a single key in the front-end</li><li style="font-family:Arial,sans-serif;font-size:14px">Paves
the way for Study Bible with the addition of Strong's
mark-up, etc.</li></ul><div dir="ltr" style="font-family:Arial,sans-serif;font-size:14px"><br>
</div><div style="font-family:Arial,sans-serif;font-size:14px">Here's
a sample of Khmer verse text with the MIDDLE DOT as the
visible word divider:</div><blockquote style="padding-left:10px;border-left:3px solid rgb(200,200,200)"><div style="font-family:Arial,sans-serif;font-size:14px;color:rgb(102,102,102)"><b>Obad.1.1</b> </div><table style="width:561pt;box-sizing:border-box;border-collapse:collapse;border-spacing:0px"><tbody><tr><td align="left" style="width:561pt;height:29.25pt;border-width:0.5pt medium 0.5pt 0.5pt;border-style:solid none solid solid;border-color:white currentcolor white white;background-color:rgb(184,204,228);padding-top:1px;padding-right:1px;padding-left:1px;vertical-align:top;color:black"><div style="font-family:Calibri,sans-serif;font-size:11pt">នេះ·ជា·សុបិន·និមិត្ដ·របស់·លោក·អូបាឌា
ព្រះអម្ចាស់·ជា·ព្រះ·មាន·បន្ទូល·ពី·ក្រុង·អេដំម ។
យើង·បាន·ឮ·ដំណឹង·មក·ពី·ព្រះអម្ចាស់
គឺ·មាន·ទូត·ម្នាក់·បាន·បញ្ជូន·ឲ្យ·ទៅ
ក្នុង·ចំណោម·ជន·ជាតិ·ទាំង·ឡាយ·ដោយ·ពាក្យ·ថា
"ចូរ·ក្រោក·ឡើង !
ចូរ·យើង·ក្រោក·ឡើង·ធ្វើ·ចម្បាំង·ទាស់·និង·គេ"</div></td></tr></tbody></table></blockquote><div dir="ltr" style="font-family:Arial,sans-serif;font-size:14px"><br>
</div><div style="font-family:Arial,sans-serif;font-size:14px">cf.
Here's what it looks like with the ZWSP as the in<span style="background-color:rgb(255,255,255)">visible
word </span>divider:</div><blockquote style="padding-left:10px;border-left:3px solid rgb(200,200,200)"><div style="font-family:Arial,sans-serif;font-size:14px;color:rgb(102,102,102)"><b>Obad.1.1</b></div><table style="width:561pt;box-sizing:border-box;border-collapse:collapse;border-spacing:0px"><tbody><tr><td align="left" style="width:561pt;height:29.25pt;border-width:0.5pt medium 0.5pt 0.5pt;border-style:solid none solid solid;border-color:white currentcolor white white;background-color:rgb(184,204,228);padding-top:1px;padding-right:1px;padding-left:1px;vertical-align:top;color:black"><div style="font-family:Calibri,sans-serif;font-size:11pt">នេះជាសុបិននិមិត្ដរបស់លោកអូបាឌា
ព្រះអម្ចាស់ជាព្រះមានបន្ទូលពីក្រុងអេដំម ។
យើងបានឮដំណឹងមកពីព្រះអម្ចាស់
គឺមានទូតម្នាក់បានបញ្ជូនឲ្យទៅ
ក្នុងចំណោមជនជាតិទាំងឡាយដោយពាក្យថា
"ចូរក្រោកឡើង !
ចូរយើងក្រោកឡើងធ្វើចម្បាំងទាស់និងគេ"</div></td></tr></tbody></table></blockquote><div dir="ltr" style="font-family:Arial,sans-serif;font-size:14px"><br>
</div><div style="font-family:Arial,sans-serif;font-size:14px">If
SWORD developers agree that my proposal merits
consideration, please would you start on the software
development.</div><div dir="ltr" style="font-family:Arial,sans-serif;font-size:14px"><br>
</div><div dir="ltr" style="font-family:Arial,sans-serif;font-size:14px"><br>
</div><div style="font-family:Arial,sans-serif;font-size:14px">
Best regards,<br>
<br>
David
</div><div dir="ltr" style="font-family:Arial,sans-serif;font-size:14px"><br>
</div><div style="font-family:Arial,sans-serif;font-size:14px">
Sent with <a href="https://pr.tn/ref/SWXT9A5YZ67G" rel="noreferrer nofollow noopener" target="_blank">Proton Mail</a> secure email.
</div></div><div> </div><p>_______________________________________________<br>
sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org" style="margin-top:0px;margin-bottom:0px" target="_blank">sword-devel@crosswire.org</a><br>
<a href="http://crosswire.org/mailman/listinfo/sword-devel" style="margin-top:0px;margin-bottom:0px" target="_blank">http://crosswire.org/mailman/listinfo/sword-devel</a><br>
Instructions to unsubscribe/change your settings at above
page<br>
</p></blockquote><div><br>
</div></div><div> </div></blockquote><br>
</div><div> </div></div>_______________________________________________<br>
sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org" target="_blank">sword-devel@crosswire.org</a><br>
<a href="http://crosswire.org/mailman/listinfo/sword-devel" rel="noreferrer" target="_blank">http://crosswire.org/mailman/listinfo/sword-devel</a><br>
Instructions to unsubscribe/change your settings at above page<br>
</blockquote></div>