<html><head></head><body> <div dir="auto">Unlike Hebrew and Arabic, etc, none of the names of the Thai <span style=" San Francisco", Helvetica, Arial, sans-serif; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: none; -webkit-text-stroke-width: 0px; text-decoration: none; display: inline !important; float: none;" dir="auto">Unicode </span>characters contain the word FINAL. <caret></caret><span style="color: var(--text-color); background: var(--bg-color);">Likewise for Myanmar letters.</span></div><div dir="auto"><br></div><div dir="auto">A possible way forward might be to run one of the several Word Segmentation programs on the text of the ThaiKJV.</div><div dir="auto"><br></div><div dir="auto">Examples: KuCut, DeepCut, AttaCut</div><div dir="auto"><br></div><div dir="auto">This should insert a Unicode zero width non-joiner (ZWNJ) as a word separator.</div><div dir="auto"><br></div><div dir="auto">NB. The module would have to be updated using the segmented source text.</div><div dir="auto"><br></div><div dir="auto">Visually, the resulting text would display the same as the original, but the module would be amenable to indexing for word searches.</div><div dir="auto"><br></div><div dir="auto">A difficulty that might then arise is how the front-end user might enter the search query for an exact phrase search type (containing more than one word). Other search types (all words, any word) might be OK as is.</div><div dir="auto"><br></div><div dir="auto">Aside: The KuCut method developed in 2004 was originally trained using the text of the ThaKJV.</div><div dir="auto"><br></div>Regards,<br><div dir="auto"><br></div><div dir="auto">David</div><div><br></div> <div id="protonmail_mobile_signature_block"><div>Sent from Proton Mail for iOS</div></div> <div><br></div><div><br></div>On Mon, Apr 17, 2023 at 17:16, Peter Von Kaehne <<a class="" href="mailto:On Mon, Apr 17, 2023 at 17:16, Peter Von Kaehne <<a href=">refdoc@gmx.net</a>> wrote:<blockquote type="cite" class="protonmail_quote"> <div style="font-family: Verdana;font-size: 12.0px;"><div>Does Thai Burmese etc etc use end forms for letters? if so, are these encoded as such?</div>
<div> </div>
<div>Peter</div>
<div>
<div>
<div style="margin:10px 5px 5px 10px; padding: 10px 0 10px 10px; border-left:2px solid #C3D9E5; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" name="quote">
<div style="margin:0 0 10px 0;"><b>Gesendet:</b> Montag, 17. April 2023 um 16:47 Uhr<br>
<b>Von:</b> "David Haslam" <dfhdfh@protonmail.com><br>
<b>An:</b> sword-devel@crosswire.org<br>
<b>Betreff:</b> [sword-devel] Languages without a space between words</div>
<div name="quoted-content">
<div>How (if at all) does the SWORD API generate a search index for a module that is for a language without a space between words?</div>
<div>
<pre style="letter-spacing: normal;text-indent: 0.0px;text-transform: none;word-spacing: 0.0px;text-decoration: none;box-sizing: border-box;margin: 15.0px 0.0px;border: 1.0px solid rgb(221,221,221);line-height: 19.0px;overflow: auto;padding: 6.0px 10.0px;"><code style="box-sizing: border-box;">Please consider how best to generate a useful search index for modules that are
for Bible translations in languages that have no spaces between words.
Example: CrossWire module ThaiKJV
See
https://en.wikipedia.org/wiki/Category:Writing_systems_without_word_boundaries
Has this ever been considered before.</code></pre>
Best regards,</div>
<div> </div>
<div>David</div>
<div> </div>
<div id="protonmail_mobile_signature_block">
<div>Sent from Proton Mail for iOS</div>
</div>
_______________________________________________ sword-devel mailing list: sword-devel@crosswire.org <a target="_blank" href="http://crosswire.org/mailman/listinfo/sword-devel">http://crosswire.org/mailman/listinfo/sword-devel</a> Instructions to unsubscribe/change your settings at above page</div>
</div>
</div>
</div></div>
_______________________________________________<br>
sword-devel mailing list: sword-devel@crosswire.org<br>
http://crosswire.org/mailman/listinfo/sword-devel<br>
Instructions to unsubscribe/change your settings at above page<br>
</blockquote></body></html>