[sword-devel] Sword phrase search not returning all expected results
Tobias Klein
contact at tklein.info
Mon Mar 3 13:52:48 EST 2025
Hi Troy,
Thank you so much!
I am using a GitHub "fork" of SWORD now (based on
crosswire-sword-mirror), in order to maintain a bugfix branch without
all the latest changes.
See
https://github.com/ezra-bible-app/crosswire-sword-mirror/tree/sword-1.9.0-bugfix.
In this fork I have now cherry-picked your bugfix commit from trunk and
can confirm that it works!
This bugfix will be part of the next Ezra Bible App release.
Best regards,
Tobias
On 3/3/25 15:01, Troy A. Griffitts wrote:
>
> Hi Tobias,
>
> I am sorry I haven't had a change to look into this sooner.
>
> Having a quick look, I didn't see the same issue you see:
>
> ```
>
> [tgriffitts at fedora sword]$ cd examples/cmdline/
> [tgriffitts at fedora cmdline]$ make
> [tgriffitts at fedora cmdline]$ ./search KJV "generation to generation"
> [0=================================50===============================100]
> ======================================================================
>
> Exod 17:16; Isa 13:20; Isa 34:10; Isa 34:17; Isa 51:8; Jer 50:39; Lam
> 5:19; Dan 4:3; Dan 4:34; Joel 3:20; Luke 1:50
>
> Exod 17:16
> Isa 13:20
> Isa 34:10
> Isa 34:17
> Isa 51:8
> Jer 50:39
> Lam 5:19
> Dan 4:3
> Dan 4:34
> Joel 3:20
> Luke 1:50
> [tgriffitts at fedora cmdline]$
>
> ```
>
> Looking into things a bit more:
>
> ```
>
> [tgriffitts at fedora sword]$ ./usrinst.sh
>
> ...
>
> Configuration:
>
> Settings:
> LIBDIR: /usr/lib64
> DEBUG: yes
> PROFILE: no
> BUILD TESTS: yes
> BUILD EXAMPLES: no
> BUILD UTILITIES: yes
> STRIP LOG DEBUG: no
> STRIP LOG INFO: no
>
> Dependencies for standard use:
> REGEX: yes
> ZLIB: yes
> LIBICU: yes
> LIBCURL: yes
> CLUCENE-CORE: yes 2.x
>
> Optional / Experimental:
> LIBCURL SFTP: yes
> BZIP2: no
> XZ: no
> ICUSWORD: no
> ICU-REGEX: yes
> CXX11-REGEX: no
> CXX11-TIME: yes
> XAPIAN-CORE: no
> GAPI: no
>
> ```
>
> If I edit usrinst.sh and remove the comment on the line:
> # --without-icu
>
> Such that my ./usrinst.sh configuration report at the end:
>
> ```
> LIBICU: no
> ```
>
> Then:
> ```
> [tgriffitts at fedora sword]$ make clean
> [tgriffitts at fedora sword]$ make -j
> [tgriffitts at fedora sword]$ cd examples/cmdline
> [tgriffitts at fedora sword]$ make clean
> [tgriffitts at fedora sword]$ make
> [tgriffitts at fedora cmdline]$ ./search KJV "generation to generation"
> [0=================================50===============================100]
> ======================================================================
>
> Isa 13:20; Isa 34:10; Isa 34:17; Isa 51:8; Jer 50:39; Dan 4:3; Dan
> 4:34; Joel 3:20; Luke 1:50
>
> Isa 13:20
> Isa 34:10
> Isa 34:17
> Isa 51:8
> Jer 50:39
> Dan 4:3
> Dan 4:34
> Joel 3:20
> Luke 1:50
> ```
>
> Bottom line, there was a bug in non-icu toupper when no maxlen was
> passed. Instead of allowing the entire toupper length to be copied to
> the buffer, it copied no characters from the uppercased string to the
> destination and set the terminating null at the start of the
> destination string.
>
> The reason we do the toupper is because the OSISPlain filter does
> double duty. It acts as the stripfilter to prepare the string for
> searching, and it also acts as the render filter when asked to render
> the verse as plaintext. So, the toupper is to change the divineName
> Lord entries to LORD for plaintext output.
>
> Thanks for finding this bug and spending time to pinpoint the place
> were the problem is occurring. Great help. I appreciate you.
>
> Troy
>
> Patch:
>
> Modified: trunk/src/mgr/stringmgr.cpp
> ===================================================================
> --- trunk/src/mgr/stringmgr.cpp 2025-03-03 13:48:49 UTC (rev 3897)
> +++ trunk/src/mgr/stringmgr.cpp 2025-03-03 13:49:36 UTC (rev 3898)
> @@ -238,7 +238,7 @@
> it = toUpperData.find(ch);
> getUTF8FromUniChar(it == toUpperData.end() ? ch : it->second, &text);
> }
> - long len = maxlen ? (text.size() < maxlen ? text.size() : (maxlen - 1)) : 0;
> + long len = maxlen ? (text.size() < maxlen ? text.size() : (maxlen - 1)) : text.size();
> if (len) memcpy(t, text.c_str(), len);
> t[len] = 0;
> #endif
>
>
>
> On 3/1/25 8:09 AM, Tobias Klein wrote:
>>
>> Hi Troy,
>>
>> can this be fixed in SWORD?
>>
>> This bug impacts the search function quite significantly. I noticed
>> when my standard test scenario for search started to fail after my
>> adjustments.
>> The reason was that the search results for my test scenario
>> significantly increased and I had to adjust the expected results.
>> The test scenario searches for "faith" in KJV. Previously (before the
>> bugfix) I expected 324 search results.
>> After the bugfix/change mentioned below there are now 338 search
>> results. So you see that quite some verses are missed by the search
>> function because of this bug.
>>
>> Best regards,
>> Tobias
>>
>> On 2/23/25 18:38, David Haslam wrote:
>>> Excellent sleuthing, Tobias !
>>>
>>> Best regards,
>>>
>>> David
>>>
>>> Sent with Proton Mail <https://proton.me/mail/home> secure email.
>>>
>>> On Sunday, February 23rd, 2025 at 5:17 PM, Tobias Klein
>>> <contact at tklein.info> wrote:
>>>>
>>>> Hi Troy,
>>>>
>>>> I have discovered the root cause of this bug.
>>>>
>>>> There is the following code in osisplain.cpp.
>>>> I suppose the uppercasing action here has negative impact on the
>>>> overall parsing when the stripText() is running?
>>>>
>>>> elseif(!strncmp(token, "/divineName", 11)) {
>>>> // Get the end portion of the string, and upper case it
>>>> char*end=buf.getRawData();
>>>> end+=buf.size() -u->lastTextNode.size();
>>>> toupperstr(end);
>>>> }
>>>> When I comment this portion out, the search bug _does not occur
>>>> anymore_ and I get a correct result, see below.
>>>>
>>>> textBuf: For he said, Because the Lord hath sworn that the Lord
>>>> will have war with Amalek from generation to generation.
>>>> term: generation to generation
>>>> Got 11 results!
>>>> Exod 17:16
>>>> Isa 13:20
>>>> Isa 34:10
>>>> Isa 34:17
>>>> Isa 51:8
>>>> Jer 50:39
>>>> Lam 5:19
>>>> Dan 4:3
>>>> Dan 4:34
>>>> Joel 3:20
>>>> Luke 1:50
>>>>
>>>> So, what the code stumbles over in the specific case of Exodus
>>>> 17:16 is the <divineName> tag and the parsing / actions related to it.
>>>> Why is the uppercasing necessary at all in the code above?
>>>> Shouldn't this be left to the application software in terms of
>>>> formatting the respective element/tag in uppercase?
>>>>
>>>> Best regards,
>>>> Tobias
>>>>
>>>> On 2/22/25 20:32, Tobias Klein wrote:
>>>>>
>>>>> Hi Troy,
>>>>>
>>>>> so I did a little debugging on this.
>>>>>
>>>>> The respective portion of code in swmodule.cpp is this code below.
>>>>> I added some conditional print outs for Exodus 17:16 to see what
>>>>> happens there.
>>>>>
>>>>> caseSEARCHTYPE_PHRASE: {
>>>>> textBuf=stripText();
>>>>> if((flags®_ICASE) ==REG_ICASE) textBuf.toUpper();
>>>>> SWKey*currentKey=getKey();
>>>>> std::stringreferenceKey="Exod 17:16";
>>>>> if(currentKey->getShortText() ==referenceKey) {
>>>>> std::cout<<"textBuf: "<<textBuf.c_str() <<std::endl;
>>>>> std::cout<<"term: "<<term.c_str() <<std::endl;
>>>>> }
>>>>> // TKL: This is where the actual search per verse happens
>>>>> sres=strstr(textBuf.c_str(), term.c_str());
>>>>>
>>>>> I get the following output based on my modification above:
>>>>>
>>>>> textBuf: For he said, Because the
>>>>> term: generation to generation
>>>>>
>>>>> The full verse content of Exodus 17:16 in KJV is this:
>>>>> For he said, Because the Lord hath sworn /that/ the Lord /will
>>>>> have/ war with Amalek from generation to generation.
>>>>>
>>>>> So ... it seems that the stripText() call strips too much of the
>>>>> content (textBuf) of the verse away.
>>>>> Based on that there is no way for the strstr call to succeed
>>>>> detecting the term "generation to generation", because at that
>>>>> point it is not part of the search string (textBuf) anymore.
>>>>>
>>>>> Could you do some investigation regarding the behavior of
>>>>> stripText here?
>>>>>
>>>>> Best regards,
>>>>> Tobias
>>>>>
>>>>> On 2/22/25 15:45, Tobias Klein wrote:
>>>>>> Hi Troy,
>>>>>>
>>>>>> an Ezra Bible App user reported that the phrase search is not
>>>>>> working as expected.
>>>>>>
>>>>>> Here is an example where the results are not as expected.
>>>>>>
>>>>>> Module: KJV
>>>>>>
>>>>>> Search term: "generation to generation"
>>>>>>
>>>>>> I get the following results from the SWORD engine:
>>>>>> Isa 13:20
>>>>>> Isa 34:10
>>>>>> Isa 34:17
>>>>>> Isa 51:8
>>>>>> Jer 50:39
>>>>>> Dan 4:3
>>>>>> Dan 4:34
>>>>>> Joel 3:20
>>>>>> Luke 1:50
>>>>>>
>>>>>> However, the verse Exodus 17:16 also contains this phrase, but is
>>>>>> not in the list of search results.
>>>>>> Could it be related to the way how the markup is structured?
>>>>>>
>>>>>> In Exodus 17:16 [KJV], the markup of the respective phrase looks
>>>>>> like this:
>>>>>>
>>>>>> <w class="strong:H01755">from generation</w> <w
>>>>>> class="strong:H01755">to generation</w>
>>>>>>
>>>>>> This is how I call the search function of the SWORD engine:
>>>>>> listKey = module->search(searchTerm.c_str(), int(searchType),
>>>>>> flags, scope, 0, internalModuleSearchProgressCB);
>>>>>> see
>>>>>> https://github.com/ezra-bible-app/node-sword-interface/blob/master/src/sword_backend/module_search.cpp#L178
>>>>>>
>>>>>> Have a nice weekend!
>>>>>>
>>>>>> Best regards,
>>>>>> Tobias
>>>>>>
>>>>>> _______________________________________________
>>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>>> http://crosswire.org/mailman/listinfo/sword-devel
>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>
>>>>> _______________________________________________
>>>>> sword-devel mailing list:sword-devel at crosswire.org
>>>>> http://crosswire.org/mailman/listinfo/sword-devel
>>>>> Instructions to unsubscribe/change your settings at above page
>>>
>>>
>>> _______________________________________________
>>> sword-devel mailing list:sword-devel at crosswire.org
>>> http://crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>
>> _______________________________________________
>> sword-devel mailing list:sword-devel at crosswire.org
>> http://crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
> _______________________________________________
> sword-devel mailing list:sword-devel at crosswire.org
> http://crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20250303/f118dcb0/attachment-0001.htm>
More information about the sword-devel
mailing list