<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hi Tobias,</p>
<p>I am sorry I haven't had a change to look into this sooner.</p>
<p>Having a quick look, I didn't see the same issue you see:</p>
<p>```</p>
<p>[tgriffitts@fedora sword]$ cd examples/cmdline/<br>
[tgriffitts@fedora cmdline]$ make<br>
[tgriffitts@fedora cmdline]$ ./search KJV "generation to
generation"<br>
[0=================================50===============================100]<br>
======================================================================<br>
<br>
Exod 17:16; Isa 13:20; Isa 34:10; Isa 34:17; Isa 51:8; Jer 50:39;
Lam 5:19; Dan 4:3; Dan 4:34; Joel 3:20; Luke 1:50<br>
<br>
Exod 17:16<br>
Isa 13:20<br>
Isa 34:10<br>
Isa 34:17<br>
Isa 51:8<br>
Jer 50:39<br>
Lam 5:19<br>
Dan 4:3<br>
Dan 4:34<br>
Joel 3:20<br>
Luke 1:50<br>
[tgriffitts@fedora cmdline]$ <br>
<br>
```</p>
<p>Looking into things a bit more:</p>
<p>```</p>
<p>[tgriffitts@fedora sword]$ ./usrinst.sh</p>
<p>...<br>
</p>
<p>Configuration:<br>
<br>
Settings:<br>
LIBDIR: /usr/lib64<br>
DEBUG: yes<br>
PROFILE: no<br>
BUILD TESTS: yes<br>
BUILD EXAMPLES: no<br>
BUILD UTILITIES: yes<br>
STRIP LOG DEBUG: no<br>
STRIP LOG INFO: no<br>
<br>
Dependencies for standard use:<br>
REGEX: yes<br>
ZLIB: yes<br>
LIBICU: yes<br>
LIBCURL: yes<br>
CLUCENE-CORE: yes 2.x<br>
<br>
Optional / Experimental:<br>
LIBCURL SFTP: yes<br>
BZIP2: no<br>
XZ: no<br>
ICUSWORD: no<br>
ICU-REGEX: yes<br>
CXX11-REGEX: no<br>
CXX11-TIME: yes<br>
XAPIAN-CORE: no<br>
GAPI: no<br>
</p>
<p>```<br>
</p>
<div class="moz-cite-prefix">If I edit usrinst.sh and remove the
comment on the line:<br>
# --without-icu<br>
<br>
Such that my ./usrinst.sh configuration report at the end:<br>
<br>
```<br>
LIBICU: no<br>
```<br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">Then:<br>
```<br>
[tgriffitts@fedora sword]$ make clean</div>
<div class="moz-cite-prefix">[tgriffitts@fedora sword]$ make -j</div>
<div class="moz-cite-prefix">[tgriffitts@fedora sword]$ cd
examples/cmdline</div>
<div class="moz-cite-prefix">[tgriffitts@fedora sword]$ make clean</div>
<div class="moz-cite-prefix">[tgriffitts@fedora sword]$ make</div>
<div class="moz-cite-prefix">[tgriffitts@fedora cmdline]$ ./search
KJV "generation to generation"<br>
[0=================================50===============================100]<br>
======================================================================<br>
<br>
Isa 13:20; Isa 34:10; Isa 34:17; Isa 51:8; Jer 50:39; Dan 4:3; Dan
4:34; Joel 3:20; Luke 1:50<br>
<br>
Isa 13:20<br>
Isa 34:10<br>
Isa 34:17<br>
Isa 51:8<br>
Jer 50:39<br>
Dan 4:3<br>
Dan 4:34<br>
Joel 3:20<br>
Luke 1:50<br>
</div>
<div class="moz-cite-prefix">```<br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">Bottom line, there was a bug in non-icu
toupper when no maxlen was passed. Instead of allowing the entire
toupper length to be copied to the buffer, it copied no characters
from the uppercased string to the destination and set the
terminating null at the start of the destination string.</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">The reason we do the toupper is because
the OSISPlain filter does double duty. It acts as the stripfilter
to prepare the string for searching, and it also acts as the
render filter when asked to render the verse as plaintext. So,
the toupper is to change the divineName Lord entries to LORD for
plaintext output.</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">Thanks for finding this bug and
spending time to pinpoint the place were the problem is
occurring. Great help. I appreciate you.</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">Troy</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">Patch:</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">
<pre wrap="" class="moz-quote-pre">Modified: trunk/src/mgr/stringmgr.cpp
===================================================================
--- trunk/src/mgr/stringmgr.cpp 2025-03-03 13:48:49 UTC (rev 3897)
+++ trunk/src/mgr/stringmgr.cpp 2025-03-03 13:49:36 UTC (rev 3898)
@@ -238,7 +238,7 @@
it = toUpperData.find(ch);
getUTF8FromUniChar(it == toUpperData.end() ? ch : it->second, &text);
}
- long len = maxlen ? (text.size() < maxlen ? text.size() : (maxlen - 1)) : 0;
+ long len = maxlen ? (text.size() < maxlen ? text.size() : (maxlen - 1)) : text.size();
if (len) memcpy(t, text.c_str(), len);
t[len] = 0;
#endif
</pre>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">On 3/1/25 8:09 AM, Tobias Klein wrote:<br>
</div>
<blockquote type="cite"
cite="mid:fb833378-67b1-49d1-ac34-6bb84690bb77@tklein.info">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<p>Hi Troy,</p>
<p>can this be fixed in SWORD?<br>
<br>
This bug impacts the search function quite significantly. I
noticed when my standard test scenario for search started to
fail after my adjustments.<br>
The reason was that the search results for my test scenario
significantly increased and I had to adjust the expected
results.<br>
The test scenario searches for "faith" in KJV. Previously
(before the bugfix) I expected 324 search results.<br>
After the bugfix/change mentioned below there are now 338 search
results. So you see that quite some verses are missed by the
search function because of this bug.<br>
<br>
Best regards,<br>
Tobias<br>
</p>
<div class="moz-cite-prefix">On 2/23/25 18:38, David Haslam wrote:<br>
</div>
<blockquote type="cite"
cite="mid:Q0tzyHcoo75zfFKSstcVygPYU6qWZjn62apU1q5-80StrTiR6TjnpMqrWpWV4yTlbn6hQWjp8YAJdTv-O4lfV9VYgrOAnLcdzFH9Td8S9Io=@protonmail.com">
<meta http-equiv="content-type"
content="text/html; charset=UTF-8">
<div style="font-family: Arial, sans-serif; font-size: 14px;">Excellent
sleuthing, Tobias !</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;"><br>
</div>
<div class="protonmail_signature_block"
style="font-family: Arial, sans-serif; font-size: 14px;">
<div class="protonmail_signature_block-user"> Best regards,<br>
<br>
David </div>
<div style="font-family: Arial, sans-serif; font-size: 14px;"><br>
</div>
<div class="protonmail_signature_block-proton"> Sent with <a
target="_blank" href="https://proton.me/mail/home"
moz-do-not-send="true">Proton Mail</a> secure email. </div>
</div>
<div style="font-family: Arial, sans-serif; font-size: 14px;"><br>
</div>
<div class="protonmail_quote"> On Sunday, February 23rd, 2025 at
5:17 PM, Tobias Klein <a class="moz-txt-link-rfc2396E"
href="mailto:contact@tklein.info" moz-do-not-send="true"><contact@tklein.info></a>
wrote:<br>
<blockquote class="protonmail_quote" type="cite">
<p>Hi Troy,</p>
<p>I have discovered the root cause of this bug.</p>
<p>There is the following code in osisplain.cpp.<br>
I suppose the uppercasing action here has negative impact
on the overall parsing when the stripText() is running?</p>
<div
style="color: #cccccc;background-color: #1f1f1f;font-family: 'Droid Sans Mono', 'monospace', monospace;font-weight: normal;font-size: 14px;line-height: 19px;white-space: pre;"><div><span
style="color: #cccccc;"> </span><span
style="color: #c586c0;">else</span><span
style="color: #cccccc;"> </span><span
style="color: #c586c0;">if</span><span
style="color: #cccccc;"> (</span><span
style="color: #d4d4d4;">!</span><span
style="color: #dcdcaa;">strncmp</span><span
style="color: #cccccc;">(</span><span
style="color: #9cdcfe;">token</span><span
style="color: #cccccc;">, </span><span
style="color: #ce9178;">"/divineName"</span><span
style="color: #cccccc;">, </span><span
style="color: #b5cea8;">11</span><span
style="color: #cccccc;">)) {</span></div><div><span
style="color: #6a9955;"> // Get the end portion of the string, and upper case it</span></div><div><span
style="color: #cccccc;"> </span><span
style="color: #569cd6;">char*</span><span
style="color: #cccccc;"> </span><span
style="color: #9cdcfe;">end</span><span
style="color: #cccccc;"> </span><span
style="color: #d4d4d4;">=</span><span
style="color: #cccccc;"> </span><span
style="color: #9cdcfe;">buf</span><span
style="color: #cccccc;">.</span><span
style="color: #dcdcaa;">getRawData</span><span
style="color: #cccccc;">();</span></div><div><span
style="color: #cccccc;"> </span><span
style="color: #9cdcfe;">end</span><span
style="color: #cccccc;"> </span><span
style="color: #d4d4d4;">+=</span><span
style="color: #cccccc;"> </span><span
style="color: #9cdcfe;">buf</span><span
style="color: #cccccc;">.</span><span
style="color: #dcdcaa;">size</span><span
style="color: #cccccc;">() </span><span
style="color: #d4d4d4;">-</span><span
style="color: #cccccc;"> </span><span
style="color: #9cdcfe;">u</span><span
style="color: #cccccc;">-></span><span
style="color: #9cdcfe;">lastTextNode</span><span
style="color: #cccccc;">.</span><span
style="color: #dcdcaa;">size</span><span
style="color: #cccccc;">();</span></div><div><span
style="color: #cccccc;"> </span><span
style="color: #dcdcaa;">toupperstr</span><span
style="color: #cccccc;">(</span><span
style="color: #9cdcfe;">end</span><span
style="color: #cccccc;">);</span></div><div><span
style="color: #cccccc;"> }</span></div></div>
<div class="moz-cite-prefix">When I comment this portion
out, the search bug <u>does not occur anymore</u> and I
get a correct result, see below.<br>
<br>
textBuf: For he said, Because the Lord hath sworn that the
Lord will have war with Amalek from generation to
generation. <br>
term: generation to generation<br>
Got 11 results!<br>
Exod 17:16<br>
Isa 13:20<br>
Isa 34:10<br>
Isa 34:17<br>
Isa 51:8<br>
Jer 50:39<br>
Lam 5:19<br>
Dan 4:3<br>
Dan 4:34<br>
Joel 3:20<br>
Luke 1:50</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">So, what the code stumbles over
in the specific case of Exodus 17:16 is the
<divineName> tag and the parsing / actions related
to it.<br>
Why is the uppercasing necessary at all in the code above?
Shouldn't this be left to the application software in
terms of formatting the respective element/tag in
uppercase?<br>
<br>
Best regards,<br>
Tobias<br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">On 2/22/25 20:32, Tobias Klein
wrote:<br>
</div>
<blockquote type="cite">
<p>Hi Troy,</p>
<p>so I did a little debugging on this.</p>
<p>The respective portion of code in swmodule.cpp is this
code below. I added some conditional print outs for
Exodus 17:16 to see what happens there.</p>
<div
style="color: #cccccc;background-color: #1f1f1f;font-family: 'Droid Sans Mono', 'monospace', monospace;font-weight: normal;font-size: 14px;line-height: 19px;white-space: pre;"><div><span
style="color: #cccccc;"> </span><span
style="color: #c586c0;">case</span><span
style="color: #cccccc;"> </span><span
style="color: #9cdcfe;">SEARCHTYPE_PHRASE</span><span
style="color: #cccccc;">: {</span></div><div><span
style="color: #cccccc;"> </span><span
style="color: #9cdcfe;">textBuf</span><span
style="color: #cccccc;"> </span><span
style="color: #dcdcaa;">=</span><span
style="color: #cccccc;"> </span><span
style="color: #dcdcaa;">stripText</span><span
style="color: #cccccc;">();</span></div><div><span
style="color: #cccccc;"> </span><span
style="color: #c586c0;">if</span><span
style="color: #cccccc;"> ((</span><span
style="color: #9cdcfe;">flags</span><span
style="color: #cccccc;"> </span><span
style="color: #d4d4d4;">&</span><span
style="color: #cccccc;"> </span><span
style="color: #569cd6;">REG_ICASE</span><span
style="color: #cccccc;">) </span><span
style="color: #d4d4d4;">==</span><span
style="color: #cccccc;"> </span><span
style="color: #569cd6;">REG_ICASE</span><span
style="color: #cccccc;">) </span><span
style="color: #9cdcfe;">textBuf</span><span
style="color: #cccccc;">.</span><span
style="color: #dcdcaa;">toUpper</span><span
style="color: #cccccc;">();</span></div>
<div><span style="color: #cccccc;"> </span><span
style="color: #4ec9b0;">SWKey</span><span
style="color: #569cd6;">*</span><span
style="color: #cccccc;"> </span><span
style="color: #9cdcfe;">currentKey</span><span
style="color: #cccccc;"> </span><span
style="color: #d4d4d4;">=</span><span
style="color: #cccccc;"> </span><span
style="color: #dcdcaa;">getKey</span><span
style="color: #cccccc;">();</span></div><div><span
style="color: #cccccc;"> </span><span
style="color: #4ec9b0;">std</span><span
style="color: #cccccc;">::</span><span
style="color: #4ec9b0;">string</span><span
style="color: #cccccc;"> </span><span
style="color: #9cdcfe;">referenceKey</span><span
style="color: #cccccc;"> </span><span
style="color: #d4d4d4;">=</span><span
style="color: #cccccc;"> </span><span
style="color: #ce9178;"><a href="http://Exod17:16"
class="moz-txt-link-rfc2396E"
rel="noreferrer nofollow noopener" moz-do-not-send="true">"Exod 17:16"</a></span><span
style="color: #cccccc;">;</span></div><div><span
style="color: #cccccc;"> </span><span
style="color: #c586c0;">if</span><span
style="color: #cccccc;"> (</span><span
style="color: #9cdcfe;">currentKey</span><span
style="color: #cccccc;">-></span><span
style="color: #dcdcaa;">getShortText</span><span
style="color: #cccccc;">() </span><span
style="color: #dcdcaa;">==</span><span
style="color: #cccccc;"> </span><span
style="color: #9cdcfe;">referenceKey</span><span
style="color: #cccccc;">) {</span></div><div><span
style="color: #cccccc;"> </span><span
style="color: #4ec9b0;">std</span><span
style="color: #cccccc;">::</span><span
style="color: #9cdcfe;">cout</span><span
style="color: #cccccc;"> </span><span
style="color: #dcdcaa;"><<</span><span
style="color: #cccccc;"> </span><span
style="color: #ce9178;">"textBuf: "</span><span
style="color: #cccccc;"> </span><span
style="color: #dcdcaa;"><<</span><span
style="color: #cccccc;"> </span><span
style="color: #9cdcfe;">textBuf</span><span
style="color: #cccccc;">.</span><span
style="color: #dcdcaa;">c_str</span><span
style="color: #cccccc;">() </span><span
style="color: #dcdcaa;"><<</span><span
style="color: #cccccc;"> </span><span
style="color: #4ec9b0;">std</span><span
style="color: #cccccc;">::</span><span
style="color: #dcdcaa;">endl</span><span
style="color: #cccccc;">;</span></div><div><span
style="color: #cccccc;"> </span><span
style="color: #4ec9b0;">std</span><span
style="color: #cccccc;">::</span><span
style="color: #9cdcfe;">cout</span><span
style="color: #cccccc;"> </span><span
style="color: #dcdcaa;"><<</span><span
style="color: #cccccc;"> </span><span
style="color: #ce9178;">"term: "</span><span
style="color: #cccccc;"> </span><span
style="color: #dcdcaa;"><<</span><span
style="color: #cccccc;"> </span><span
style="color: #9cdcfe;">term</span><span
style="color: #cccccc;">.</span><span
style="color: #dcdcaa;">c_str</span><span
style="color: #cccccc;">() </span><span
style="color: #dcdcaa;"><<</span><span
style="color: #cccccc;"> </span><span
style="color: #4ec9b0;">std</span><span
style="color: #cccccc;">::</span><span
style="color: #dcdcaa;">endl</span><span
style="color: #cccccc;">;</span></div><div><span
style="color: #cccccc;"> }</span></div></div>
<div
style="color: #cccccc;background-color: #1f1f1f;font-family: 'Droid Sans Mono', 'monospace', monospace;font-weight: normal;font-size: 14px;line-height: 19px;white-space: pre;">
<div><span style="color: #cccccc;"> // TKL: This is where the actual search per verse happens</span></div><div> <span
style="color: #9cdcfe;">sres</span><span
style="color: #cccccc;"> </span><span
style="color: #d4d4d4;">=</span><span
style="color: #cccccc;"> </span><span
style="color: #dcdcaa;">strstr</span><span
style="color: #cccccc;">(</span><span
style="color: #9cdcfe;">textBuf</span><span
style="color: #cccccc;">.</span><span
style="color: #dcdcaa;">c_str</span><span
style="color: #cccccc;">(), </span><span
style="color: #9cdcfe;">term</span><span
style="color: #cccccc;">.</span><span
style="color: #dcdcaa;">c_str</span><span
style="color: #cccccc;">());</span></div></div>
<p>I get the following output based on my modification
above:<br>
<br>
textBuf: For he said, Because the <br>
term: generation to generation<br>
</p>
<p>The full verse content of Exodus 17:16 in KJV is this:<br>
For he said, Because the Lord hath sworn <i>that</i>
the Lord <i>will have</i> war with Amalek from
generation to generation. <br>
<br>
So ... it seems that the stripText() call strips too
much of the content (textBuf) of the verse away.<br>
Based on that there is no way for the strstr call to
succeed detecting the term "generation to generation",
because at that point it is not part of the search
string (textBuf) anymore.</p>
<p>Could you do some investigation regarding the behavior
of stripText here?<br>
<br>
Best regards,<br>
Tobias</p>
<div class="moz-cite-prefix">On 2/22/25 15:45, Tobias
Klein wrote:<br>
</div>
<blockquote type="cite">Hi Troy, <br>
<br>
an Ezra Bible App user reported that the phrase search
is not working as expected. <br>
<br>
Here is an example where the results are not as
expected. <br>
<br>
Module: KJV <br>
<br>
Search term: "generation to generation" <br>
<br>
I get the following results from the SWORD engine: <br>
Isa 13:20 <br>
Isa 34:10 <br>
Isa 34:17 <br>
Isa 51:8 <br>
Jer 50:39 <br>
Dan 4:3 <br>
Dan 4:34 <br>
Joel 3:20 <br>
Luke 1:50 <br>
<br>
However, the verse Exodus 17:16 also contains this
phrase, but is not in the list of search results. <br>
Could it be related to the way how the markup is
structured? <br>
<br>
In Exodus 17:16 [KJV], the markup of the respective
phrase looks like this: <br>
<br>
<w class=<a class="moz-txt-link-rfc2396E"
moz-do-not-send="true">"strong:H01755"</a>>from
generation</w> <w class=<a
class="moz-txt-link-rfc2396E" moz-do-not-send="true">"strong:H01755"</a>>to
generation</w> <br>
<br>
This is how I call the search function of the SWORD
engine: <br>
listKey = module->search(searchTerm.c_str(),
int(searchType), flags, scope, 0,
internalModuleSearchProgressCB); <br>
see <a
href="https://github.com/ezra-bible-app/node-sword-interface/blob/master/src/sword_backend/module_search.cpp#L178"
class="moz-txt-link-freetext" target="_blank"
rel="noreferrer nofollow noopener"
moz-do-not-send="true">https://github.com/ezra-bible-app/node-sword-interface/blob/master/src/sword_backend/module_search.cpp#L178</a><br>
<br>
Have a nice weekend! <br>
<br>
Best regards, <br>
Tobias <br>
<br>
_______________________________________________ <br>
sword-devel mailing list: <a
href="mailto:sword-devel@crosswire.org"
class="moz-txt-link-abbreviated moz-txt-link-freetext"
rel="noreferrer nofollow noopener"
moz-do-not-send="true">sword-devel@crosswire.org</a> <br>
<a
href="http://crosswire.org/mailman/listinfo/sword-devel"
class="moz-txt-link-freetext" target="_blank"
rel="noreferrer nofollow noopener"
moz-do-not-send="true">http://crosswire.org/mailman/listinfo/sword-devel</a>
<br>
Instructions to unsubscribe/change your settings at
above page <br>
</blockquote>
<br>
<fieldset class="moz-mime-attachment-header"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org"
class="moz-txt-link-abbreviated moz-txt-link-freetext"
rel="noreferrer nofollow noopener" moz-do-not-send="true">sword-devel@crosswire.org</a>
<a href="http://crosswire.org/mailman/listinfo/sword-devel"
class="moz-txt-link-freetext" target="_blank"
rel="noreferrer nofollow noopener" moz-do-not-send="true">http://crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page
</pre>
</blockquote>
</blockquote>
<br>
</div>
<br>
<fieldset class="moz-mime-attachment-header"></fieldset>
<pre wrap="" class="moz-quote-pre">_______________________________________________
sword-devel mailing list: <a
class="moz-txt-link-abbreviated moz-txt-link-freetext"
href="mailto:sword-devel@crosswire.org" moz-do-not-send="true">sword-devel@crosswire.org</a>
<a class="moz-txt-link-freetext"
href="http://crosswire.org/mailman/listinfo/sword-devel"
moz-do-not-send="true">http://crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page
</pre>
</blockquote>
<br>
<fieldset class="moz-mime-attachment-header"></fieldset>
<pre wrap="" class="moz-quote-pre">_______________________________________________
sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a>
<a class="moz-txt-link-freetext" href="http://crosswire.org/mailman/listinfo/sword-devel">http://crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page
</pre>
</blockquote>
</body>
</html>