<html><head></head><body>   <div dir="auto">Thanks Troy</div><div dir="auto"><br></div><div dir="auto">One detail you omitted is that search ignores punctuation and is case-insensitive for bicameral scripts. </div><div dir="auto"><br></div><div dir="auto">eg. An exact phrase search of the KJV for “verily verily” will find “Verily, verily, …”</div><div dir="auto"><br></div><div dir="auto">Referring to earlier discussion, would SWORD search count the ZWNJ as a space?<caret></caret></div><div dir="auto"><br></div><div dir="auto">David</div><div><br></div> <div id="protonmail_mobile_signature_block"><div>Sent from Proton Mail for iOS</div></div> <div><br></div><div><br></div>On Tue, Apr 18, 2023 at 01:08, Troy A. Griffitts <<a class="" href="mailto:scribe@crosswire.org">scribe@crosswire.org</a>> wrote:<blockquote type="cite" class="protonmail_quote">  
  
    
  
  
    <p>Great suggestions all.  One thing to interject: SWORD raw search
      simply looks for a needles in a haystack-- it doesn't break words
      at all in the haystack.  Multi-word search-type will break the
      needles up by a space, e.g., if you search for "God love world"
      and specify multi-word then you effectively get a search for a 3
      needles. "phrase" search-type takes the search term as one needle.
      Whether or not that would be more or less useful here, I'll let
      the language-informed determine.<br>
    </p>
    <div class="moz-cite-prefix">On 4/17/23 11:24, Greg Hellings wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">
        <div>Yes, that looks like the type of thing. Although that is
          for Lucene (Java). I don't know the status of CLucene's
          implementation of that nor of Xapian's. But that would be the
          proper place for such processing to occur. If those libraries
          do not have one, interested parties could submit one. They
          could probably develop it inside of the SWORD library to be
          sure it's doing what they want it to do (I believe those
          filters are designed to be pluggable by the calling
          application) before submitting it to those projects for
          inclusion.</div>
        <div><br>
        </div>
        <div>--Greg<br>
        </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div class="gmail_attr" dir="ltr">On Mon, Apr 17, 2023 at
          1:12 PM David Haslam <<a class="moz-txt-link-freetext" href="mailto:dfhdfh@protonmail.com">dfhdfh@protonmail.com</a>>
          wrote:<br>
        </div>
        <blockquote style="margin:0px 0px 0px
          0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote">
          <div>
            <div dir="auto">Thanks, Greg.</div>
            <div dir="auto"><br>
            </div>
            <div dir="auto">I just came across this</div>
            <div dir="auto"><br>
            </div>
            <div dir="auto"><a class="moz-txt-link-freetext" target="_blank" dir="auto" href="https://lucene.apache.org/core/3_2_0/api/contrib-analyzers/org/apache/lucene/analysis/th/ThaiWordFilter.html">https://lucene.apache.org/core/3_2_0/api/contrib-analyzers/org/apache/lucene/analysis/th/ThaiWordFilter.html</a><br>
            </div>
            <div dir="auto"><br>
            </div>
            <div dir="auto">Is that the kind of thing you were thinking
              of?</div>
            <div dir="auto"><br>
            </div>
            <div dir="auto">David</div>
            <div><br>
            </div>
            <div id="m_3891278540368212446protonmail_mobile_signature_block">
              <div>Sent from Proton Mail for iOS</div>
            </div>
            <div><br>
            </div>
            <div><br>
            </div>
            On Mon, Apr 17, 2023 at 17:51, Greg Hellings <<a target="_blank" href="mailto:On+Mon,+Apr+17,+2023+at+17:51,+Greg+Hellings+%3C%3Ca+href=">greg.hellings@gmail.com</a>>
            wrote:
            <blockquote type="cite">
              <div dir="ltr">
                <div>I don't believe you're going to get that sort of
                  feature directly in the engine's simple search.</div>
                <div><br>
                </div>
                <div>However, if you're using a build of the library
                  that utilizes CLucene or Xapian, then that should be
                  the function of those libraries. They are supposed to
                  be able to handle all of that type of functionality if
                  the language has a corresponding contribution to that
                  library. It might be better to check in with them.</div>
                <div><br>
                </div>
                <div>--Greg<br>
                </div>
              </div>
              <br>
              <div class="gmail_quote">
                <div dir="ltr" class="gmail_attr">On Mon, Apr 17, 2023
                  at 11:46 AM David Haslam <<a class="moz-txt-link-freetext" target="_blank" href="mailto:dfhdfh@protonmail.com">dfhdfh@protonmail.com</a>>
                  wrote:<br>
                </div>
                <blockquote class="gmail_quote" style="margin:0px 0px 0px
                  0.8ex;border-left:1px solid
                  rgb(204,204,204);padding-left:1ex">
                  <div>
                    <div dir="auto">Unlike Hebrew and Arabic, etc, none
                      of the names of the Thai <span dir="auto">Unicode
                      </span>characters contain the word FINAL. <span>Likewise
                        for Myanmar letters.</span></div>
                    <div dir="auto"><br>
                    </div>
                    <div dir="auto">A possible way forward might be to
                      run one of the several Word Segmentation programs
                      on the text of the ThaiKJV.</div>
                    <div dir="auto"><br>
                    </div>
                    <div dir="auto">Examples: KuCut, DeepCut, AttaCut</div>
                    <div dir="auto"><br>
                    </div>
                    <div dir="auto">This should insert a Unicode zero
                      width non-joiner (ZWNJ) as a word separator.</div>
                    <div dir="auto"><br>
                    </div>
                    <div dir="auto">NB. The module would have to be
                      updated using the segmented source text.</div>
                    <div dir="auto"><br>
                    </div>
                    <div dir="auto">Visually, the resulting text would
                      display the same as the original, but the module
                      would be amenable to indexing for word searches.</div>
                    <div dir="auto"><br>
                    </div>
                    <div dir="auto">A difficulty that might then arise
                      is how the front-end user might enter the search
                      query for an exact phrase search type (containing
                      more than one word). Other search types (all
                      words, any word) might be OK as is.</div>
                    <div dir="auto"><br>
                    </div>
                    <div dir="auto">Aside: The KuCut method developed in
                      2004 was originally trained using the text of the
                      ThaKJV.</div>
                    <div dir="auto"><br>
                    </div>
                    Regards,<br>
                    <div dir="auto"><br>
                    </div>
                    <div dir="auto">David</div>
                    <div><br>
                    </div>
                    <div id="m_3891278540368212446m_-6310528066662867627protonmail_mobile_signature_block">
                      <div>Sent from Proton Mail for iOS</div>
                    </div>
                    <div><br>
                    </div>
                    <div><br>
                    </div>
                    On Mon, Apr 17, 2023 at 17:16, Peter Von Kaehne <<a target="_blank" href="mailto:On+Mon,+Apr+17,+2023+at+17:16,+Peter+Von+Kaehne+%3C%3Ca+href=">refdoc@gmx.net</a>>
                    wrote:
                    <blockquote type="cite">
                      <div style="font-family:Verdana;font-size:12px">
                        <div>Does Thai Burmese etc etc use end forms for
                          letters? if so, are these encoded as such?</div>
                        <div> </div>
                        <div>Peter</div>
                        <div>
                          <div>
                            <div style="margin:10px 5px 5px
                              10px;padding:10px 0px 10px
                              10px;border-left:2px solid
                              rgb(195,217,229)" name="quote">
                              <div style="margin:0px 0px 10px"><b>Gesendet:</b>
                                Montag, 17. April 2023 um 16:47 Uhr<br>
                                <b>Von:</b> "David Haslam" <<a class="moz-txt-link-freetext" target="_blank" href="mailto:dfhdfh@protonmail.com">dfhdfh@protonmail.com</a>><br>
                                <b>An:</b> <a class="moz-txt-link-freetext" target="_blank" href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br>
                                <b>Betreff:</b> [sword-devel] Languages
                                without a space between words</div>
                              <div name="quoted-content">
                                <div>How (if at all) does the SWORD API
                                  generate a search index for a module
                                  that is for a language without a space
                                  between words?</div>
                                <div>
                                  <pre style="letter-spacing:normal;text-indent:0px;text-transform:none;word-spacing:0px;text-decoration:none;box-sizing:border-box;margin:15px 0px;border:1px solid rgb(221,221,221);line-height:19px;overflow:auto;padding:6px 10px"><code style="box-sizing:border-box">Please consider how best to generate a useful search index for modules that are
for Bible translations in languages that have no spaces between words.

Example: CrossWire module ThaiKJV

See
<a class="moz-txt-link-freetext" target="_blank" href="https://en.wikipedia.org/wiki/Category:Writing_systems_without_word_boundaries">https://en.wikipedia.org/wiki/Category:Writing_systems_without_word_boundaries</a>

Has this ever been considered before.</code></pre>
                                  Best regards,</div>
                                <div> </div>
                                <div>David</div>
                                <div> </div>
                                <div id="m_3891278540368212446m_-6310528066662867627protonmail_mobile_signature_block">
                                  <div>Sent from Proton Mail for iOS</div>
                                </div>
_______________________________________________ sword-devel mailing
                                list: <a class="moz-txt-link-freetext" target="_blank" href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a>
                                <a class="moz-txt-link-freetext" target="_blank" href="http://crosswire.org/mailman/listinfo/sword-devel">http://crosswire.org/mailman/listinfo/sword-devel</a>
                                Instructions to unsubscribe/change your
                                settings at above page</div>
                            </div>
                          </div>
                        </div>
                      </div>
                      _______________________________________________<br>
                      sword-devel mailing list: <a class="moz-txt-link-freetext" target="_blank" href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br>
                      <a class="moz-txt-link-freetext" target="_blank" href="http://crosswire.org/mailman/listinfo/sword-devel">http://crosswire.org/mailman/listinfo/sword-devel</a><br>
                      Instructions to unsubscribe/change your settings
                      at above page<br>
                    </blockquote>
                  </div>
                  _______________________________________________<br>
                  sword-devel mailing list: <a class="moz-txt-link-freetext" target="_blank" href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br>
                  <a class="moz-txt-link-freetext" target="_blank" href="http://crosswire.org/mailman/listinfo/sword-devel" rel="noreferrer">http://crosswire.org/mailman/listinfo/sword-devel</a><br>
                  Instructions to unsubscribe/change your settings at
                  above page<br>
                </blockquote>
              </div>
            </blockquote>
          </div>
          _______________________________________________<br>
          sword-devel mailing list: <a class="moz-txt-link-freetext" target="_blank" href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br>
          <a class="moz-txt-link-freetext" target="_blank" rel="noreferrer" href="http://crosswire.org/mailman/listinfo/sword-devel">http://crosswire.org/mailman/listinfo/sword-devel</a><br>
          Instructions to unsubscribe/change your settings at above page<br>
        </blockquote>
      </div>
      <br>
      <fieldset class="moz-mime-attachment-header"></fieldset>
      <pre wrap="" class="moz-quote-pre">_______________________________________________
sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org" class="moz-txt-link-abbreviated">sword-devel@crosswire.org</a>
<a href="http://crosswire.org/mailman/listinfo/sword-devel" class="moz-txt-link-freetext">http://crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page
</pre>
    </blockquote>
  

</blockquote></body></html>