[sword-devel] Sword phrase search not returning all expected results

Tobias Klein contact at tklein.info
Mon Mar 3 13:52:48 EST 2025


Hi Troy,

Thank you so much!

I am using a GitHub "fork" of SWORD now (based on 
crosswire-sword-mirror), in order to maintain a bugfix branch without 
all the latest changes.
See 
https://github.com/ezra-bible-app/crosswire-sword-mirror/tree/sword-1.9.0-bugfix.
In this fork I have now cherry-picked your bugfix commit from trunk and 
can confirm that it works!

This bugfix will be part of the next Ezra Bible App release.

Best regards,
Tobias

On 3/3/25 15:01, Troy A. Griffitts wrote:
>
> Hi Tobias,
>
> I am sorry I haven't had a change to look into this sooner.
>
> Having a quick look, I didn't see the same issue you see:
>
> ```
>
> [tgriffitts at fedora sword]$ cd examples/cmdline/
> [tgriffitts at fedora cmdline]$ make
> [tgriffitts at fedora cmdline]$ ./search KJV "generation to generation"
> [0=================================50===============================100]
>  ======================================================================
>
> Exod 17:16; Isa 13:20; Isa 34:10; Isa 34:17; Isa 51:8; Jer 50:39; Lam 
> 5:19; Dan 4:3; Dan 4:34; Joel 3:20; Luke 1:50
>
> Exod 17:16
> Isa 13:20
> Isa 34:10
> Isa 34:17
> Isa 51:8
> Jer 50:39
> Lam 5:19
> Dan 4:3
> Dan 4:34
> Joel 3:20
> Luke 1:50
> [tgriffitts at fedora cmdline]$
>
> ```
>
> Looking into things a bit more:
>
> ```
>
> [tgriffitts at fedora sword]$ ./usrinst.sh
>
> ...
>
> Configuration:
>
>  Settings:
>      LIBDIR:               /usr/lib64
>      DEBUG:                yes
>      PROFILE:              no
>      BUILD TESTS:          yes
>      BUILD EXAMPLES:       no
>      BUILD UTILITIES:      yes
>      STRIP LOG DEBUG:      no
>      STRIP LOG INFO:       no
>
>  Dependencies for standard use:
>      REGEX:                yes
>      ZLIB:                 yes
>      LIBICU:               yes
>      LIBCURL:              yes
>      CLUCENE-CORE:         yes 2.x
>
>  Optional / Experimental:
>      LIBCURL SFTP:         yes
>      BZIP2:                no
>      XZ:                   no
>      ICUSWORD:             no
>      ICU-REGEX:            yes
>      CXX11-REGEX:          no
>      CXX11-TIME:           yes
>      XAPIAN-CORE:          no
>      GAPI:                 no
>
> ```
>
> If I edit usrinst.sh and remove the comment on the line:
> # --without-icu
>
> Such that my ./usrinst.sh configuration report at the end:
>
> ```
>      LIBICU:               no
> ```
>
> Then:
> ```
> [tgriffitts at fedora sword]$ make clean
> [tgriffitts at fedora sword]$ make -j
> [tgriffitts at fedora sword]$ cd examples/cmdline
> [tgriffitts at fedora sword]$ make clean
> [tgriffitts at fedora sword]$ make
> [tgriffitts at fedora cmdline]$ ./search KJV "generation to generation"
> [0=================================50===============================100]
>  ======================================================================
>
> Isa 13:20; Isa 34:10; Isa 34:17; Isa 51:8; Jer 50:39; Dan 4:3; Dan 
> 4:34; Joel 3:20; Luke 1:50
>
> Isa 13:20
> Isa 34:10
> Isa 34:17
> Isa 51:8
> Jer 50:39
> Dan 4:3
> Dan 4:34
> Joel 3:20
> Luke 1:50
> ```
>
> Bottom line, there was a bug in non-icu toupper when no maxlen was 
> passed.  Instead of allowing the entire toupper length to be copied to 
> the buffer, it copied no characters from the uppercased string to the 
> destination and set the terminating null at the start of the 
> destination string.
>
> The reason we do the toupper is because the OSISPlain filter does 
> double duty.  It acts as the stripfilter to prepare the string for 
> searching, and it also acts as the render filter when asked to render 
> the verse as plaintext.  So, the toupper is to change the divineName 
> Lord entries to LORD for plaintext output.
>
> Thanks for finding this bug and spending time to pinpoint the place 
> were the problem is occurring.  Great help.  I appreciate you.
>
> Troy
>
> Patch:
>
> Modified: trunk/src/mgr/stringmgr.cpp
> ===================================================================
> --- trunk/src/mgr/stringmgr.cpp	2025-03-03 13:48:49 UTC (rev 3897)
> +++ trunk/src/mgr/stringmgr.cpp	2025-03-03 13:49:36 UTC (rev 3898)
> @@ -238,7 +238,7 @@
>   		it = toUpperData.find(ch);
>   		getUTF8FromUniChar(it == toUpperData.end() ? ch : it->second, &text);
>   	}
> -	long len = maxlen ? (text.size() < maxlen ? text.size() : (maxlen - 1)) : 0;
> +	long len = maxlen ? (text.size() < maxlen ? text.size() : (maxlen - 1)) : text.size();
>   	if (len) memcpy(t, text.c_str(), len);
>   	t[len] = 0;
>   #endif
>
>
>
> On 3/1/25 8:09 AM, Tobias Klein wrote:
>>
>> Hi Troy,
>>
>> can this be fixed in SWORD?
>>
>> This bug impacts the search function quite significantly. I noticed 
>> when my standard test scenario for search started to fail after my 
>> adjustments.
>> The reason was that the search results for my test scenario 
>> significantly increased and I had to adjust the expected results.
>> The test scenario searches for "faith" in KJV. Previously (before the 
>> bugfix) I expected 324 search results.
>> After the bugfix/change mentioned below there are now 338 search 
>> results. So you see that quite some verses are missed by the search 
>> function because of this bug.
>>
>> Best regards,
>> Tobias
>>
>> On 2/23/25 18:38, David Haslam wrote:
>>> Excellent sleuthing, Tobias !
>>>
>>> Best regards,
>>>
>>> David
>>>
>>> Sent with Proton Mail <https://proton.me/mail/home> secure email.
>>>
>>> On Sunday, February 23rd, 2025 at 5:17 PM, Tobias Klein 
>>> <contact at tklein.info> wrote:
>>>>
>>>> Hi Troy,
>>>>
>>>> I have discovered the root cause of this bug.
>>>>
>>>> There is the following code in osisplain.cpp.
>>>> I suppose the uppercasing action here has negative impact on the 
>>>> overall parsing when the stripText() is running?
>>>>
>>>> elseif(!strncmp(token, "/divineName", 11)) {
>>>> // Get the end portion of the string, and upper case it
>>>> char*end=buf.getRawData();
>>>> end+=buf.size() -u->lastTextNode.size();
>>>> toupperstr(end);
>>>> }
>>>> When I comment this portion out, the search bug _does not occur 
>>>> anymore_ and I get a correct result, see below.
>>>>
>>>> textBuf: For he said, Because the Lord hath sworn that the Lord 
>>>> will have war with Amalek from generation to generation.
>>>> term: generation to generation
>>>> Got 11 results!
>>>> Exod 17:16
>>>> Isa 13:20
>>>> Isa 34:10
>>>> Isa 34:17
>>>> Isa 51:8
>>>> Jer 50:39
>>>> Lam 5:19
>>>> Dan 4:3
>>>> Dan 4:34
>>>> Joel 3:20
>>>> Luke 1:50
>>>>
>>>> So, what the code stumbles over in the specific case of Exodus 
>>>> 17:16 is the <divineName> tag and the parsing / actions related to it.
>>>> Why is the uppercasing necessary at all in the code above? 
>>>> Shouldn't this be left to the application software in terms of 
>>>> formatting the respective element/tag in uppercase?
>>>>
>>>> Best regards,
>>>> Tobias
>>>>
>>>> On 2/22/25 20:32, Tobias Klein wrote:
>>>>>
>>>>> Hi Troy,
>>>>>
>>>>> so I did a little debugging on this.
>>>>>
>>>>> The respective portion of code in swmodule.cpp is this code below. 
>>>>> I added some conditional print outs for Exodus 17:16 to see what 
>>>>> happens there.
>>>>>
>>>>> caseSEARCHTYPE_PHRASE: {
>>>>> textBuf=stripText();
>>>>> if((flags&REG_ICASE) ==REG_ICASE) textBuf.toUpper();
>>>>> SWKey*currentKey=getKey();
>>>>> std::stringreferenceKey="Exod 17:16";
>>>>> if(currentKey->getShortText() ==referenceKey) {
>>>>> std::cout<<"textBuf: "<<textBuf.c_str() <<std::endl;
>>>>> std::cout<<"term: "<<term.c_str() <<std::endl;
>>>>> }
>>>>> // TKL: This is where the actual search per verse happens
>>>>> sres=strstr(textBuf.c_str(), term.c_str());
>>>>>
>>>>> I get the following output based on my modification above:
>>>>>
>>>>> textBuf: For he said, Because the
>>>>> term: generation to generation
>>>>>
>>>>> The full verse content of Exodus 17:16 in KJV is this:
>>>>> For he said, Because the Lord hath sworn /that/ the Lord /will 
>>>>> have/ war with Amalek from generation to generation.
>>>>>
>>>>> So ... it seems that the stripText() call strips too much of the 
>>>>> content (textBuf) of the verse away.
>>>>> Based on that there is no way for the strstr call to succeed 
>>>>> detecting the term "generation to generation", because at that 
>>>>> point it is not part of the search string (textBuf) anymore.
>>>>>
>>>>> Could you do some investigation regarding the behavior of 
>>>>> stripText here?
>>>>>
>>>>> Best regards,
>>>>> Tobias
>>>>>
>>>>> On 2/22/25 15:45, Tobias Klein wrote:
>>>>>> Hi Troy,
>>>>>>
>>>>>> an Ezra Bible App user reported that the phrase search is not 
>>>>>> working as expected.
>>>>>>
>>>>>> Here is an example where the results are not as expected.
>>>>>>
>>>>>> Module: KJV
>>>>>>
>>>>>> Search term: "generation to generation"
>>>>>>
>>>>>> I get the following results from the SWORD engine:
>>>>>> Isa 13:20
>>>>>> Isa 34:10
>>>>>> Isa 34:17
>>>>>> Isa 51:8
>>>>>> Jer 50:39
>>>>>> Dan 4:3
>>>>>> Dan 4:34
>>>>>> Joel 3:20
>>>>>> Luke 1:50
>>>>>>
>>>>>> However, the verse Exodus 17:16 also contains this phrase, but is 
>>>>>> not in the list of search results.
>>>>>> Could it be related to the way how the markup is structured?
>>>>>>
>>>>>> In Exodus 17:16 [KJV], the markup of the respective phrase looks 
>>>>>> like this:
>>>>>>
>>>>>> <w class="strong:H01755">from generation</w> <w 
>>>>>> class="strong:H01755">to generation</w>
>>>>>>
>>>>>> This is how I call the search function of the SWORD engine:
>>>>>> listKey = module->search(searchTerm.c_str(), int(searchType), 
>>>>>> flags, scope, 0, internalModuleSearchProgressCB);
>>>>>> see 
>>>>>> https://github.com/ezra-bible-app/node-sword-interface/blob/master/src/sword_backend/module_search.cpp#L178
>>>>>>
>>>>>> Have a nice weekend!
>>>>>>
>>>>>> Best regards,
>>>>>> Tobias
>>>>>>
>>>>>> _______________________________________________
>>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>>> http://crosswire.org/mailman/listinfo/sword-devel
>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>
>>>>> _______________________________________________
>>>>> sword-devel mailing list:sword-devel at crosswire.org
>>>>> http://crosswire.org/mailman/listinfo/sword-devel
>>>>> Instructions to unsubscribe/change your settings at above page
>>>
>>>
>>> _______________________________________________
>>> sword-devel mailing list:sword-devel at crosswire.org
>>> http://crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>
>> _______________________________________________
>> sword-devel mailing list:sword-devel at crosswire.org
>> http://crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
> _______________________________________________
> sword-devel mailing list:sword-devel at crosswire.org
> http://crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20250303/f118dcb0/attachment-0001.htm>


More information about the sword-devel mailing list