[sword-devel] Sword phrase search not returning all expected results

Tobias Klein contact at tklein.info
Mon Mar 3 14:37:09 EST 2025


What I forgot to say:
I'm building SWORD without ICU on all platforms. And that fits your 
explanation about the root cause.

I first started like this (removing the ICU linking) on Android, because it 
would have been too cumbersome and later synced the build configuration for 
Linux, Windows, macOS as well.

Best regards,
Tobias

Am 3. März 2025 19:53:33 schrieb Tobias Klein <contact at tklein.info>:
> Hi Troy,
> Thank you so much!
> I am using a GitHub "fork" of SWORD now (based on crosswire-sword-mirror), 
> in order to maintain a bugfix branch without all the latest changes.
> See
> https://github.com/ezra-bible-app/crosswire-sword-mirror/tree/sword-1.9.0-bugfix.
> In this fork I have now cherry-picked your bugfix commit from trunk and can 
> confirm that it works!
>
> This bugfix will be part of the next Ezra Bible App release.
> Best regards,
> Tobias
> On 3/3/25 15:01, Troy A. Griffitts wrote:
>> Hi Tobias,
>> I am sorry I haven't had a change to look into this sooner.
>> Having a quick look, I didn't see the same issue you see:
>> ```
>> [tgriffitts at fedora sword]$ cd examples/cmdline/
>> [tgriffitts at fedora cmdline]$ make
>> [tgriffitts at fedora cmdline]$ ./search KJV "generation to generation"
>> [0=================================50===============================100]
>> ======================================================================
>>
>> Exod 17:16; Isa 13:20; Isa 34:10; Isa 34:17; Isa 51:8; Jer 50:39; Lam 5:19; 
>> Dan 4:3; Dan 4:34; Joel 3:20; Luke 1:50
>>
>> Exod 17:16
>> Isa 13:20
>> Isa 34:10
>> Isa 34:17
>> Isa 51:8
>> Jer 50:39
>> Lam 5:19
>> Dan 4:3
>> Dan 4:34
>> Joel 3:20
>> Luke 1:50
>> [tgriffitts at fedora cmdline]$
>>
>> ```Looking into things a bit more:
>> ```
>> [tgriffitts at fedora sword]$ ./usrinst.sh
>> ...
>> Configuration:
>>
>> Settings:
>> LIBDIR:               /usr/lib64
>> DEBUG:                yes
>> PROFILE:              no
>> BUILD TESTS:          yes
>> BUILD EXAMPLES:       no
>> BUILD UTILITIES:      yes
>> STRIP LOG DEBUG:      no
>> STRIP LOG INFO:       no
>>
>> Dependencies for standard use:
>> REGEX:                yes
>> ZLIB:                 yes
>> LIBICU:               yes
>> LIBCURL:              yes
>> CLUCENE-CORE:         yes 2.x
>>
>> Optional / Experimental:
>> LIBCURL SFTP:         yes
>> BZIP2:                no
>> XZ:                   no
>> ICUSWORD:             no
>> ICU-REGEX:            yes
>> CXX11-REGEX:          no
>> CXX11-TIME:           yes
>> XAPIAN-CORE:          no
>> GAPI:                 no
>> ```
>> If I edit usrinst.sh and remove the comment on the line:
>> # --without-icu
>>
>> Such that my ./usrinst.sh configuration report at the end:
>>
>> ```
>> LIBICU:               no
>> ```
>>
>> Then:
>> ```
>> [tgriffitts at fedora sword]$ make clean[tgriffitts at fedora sword]$ make -j
>> [tgriffitts at fedora sword]$ cd examples/cmdline
>> [tgriffitts at fedora sword]$ make clean
>> [tgriffitts at fedora sword]$ make
>> [tgriffitts at fedora cmdline]$ ./search KJV "generation to generation"
>> [0=================================50===============================100]
>> ======================================================================
>>
>> Isa 13:20; Isa 34:10; Isa 34:17; Isa 51:8; Jer 50:39; Dan 4:3; Dan 4:34; 
>> Joel 3:20; Luke 1:50
>>
>> Isa 13:20
>> Isa 34:10
>> Isa 34:17
>> Isa 51:8
>> Jer 50:39
>> Dan 4:3
>> Dan 4:34
>> Joel 3:20
>> Luke 1:50
>> ```
>>
>> Bottom line, there was a bug in non-icu toupper when no maxlen was passed.  
>> Instead of allowing the entire toupper length to be copied to the buffer, 
>> it copied no characters from the uppercased string to the destination and 
>> set the terminating null at the start of the destination string.
>>
>> The reason we do the toupper is because the OSISPlain filter does double 
>> duty.  It acts as the stripfilter to prepare the string for searching, and 
>> it also acts as the render filter when asked to render the verse as 
>> plaintext.  So, the toupper is to change the divineName Lord entries to 
>> LORD for plaintext output.
>>
>> Thanks for finding this bug and spending time to pinpoint the place were 
>> the problem is occurring.  Great help.  I appreciate you.
>>
>> Troy
>>
>> Patch:
>>
>> Modified: trunk/src/mgr/stringmgr.cpp
>> ===================================================================
>> --- trunk/src/mgr/stringmgr.cpp	2025-03-03 13:48:49 UTC (rev 3897)
>> +++ trunk/src/mgr/stringmgr.cpp	2025-03-03 13:49:36 UTC (rev 3898)
>> @@ -238,7 +238,7 @@ it = toUpperData.find(ch); getUTF8FromUniChar(it == 
>> toUpperData.end() ? ch : it->second, &text); }
>> -	long len = maxlen ? (text.size() < maxlen ? text.size() : (maxlen - 1)) : 0;
>> +	long len = maxlen ? (text.size() < maxlen ? text.size() : (maxlen - 1)) : 
>> text.size(); if (len) memcpy(t, text.c_str(), len); t[len] = 0; #endif
>>
>>
>>
>> On 3/1/25 8:09 AM, Tobias Klein wrote:
>>> Hi Troy,
>>> can this be fixed in SWORD?
>>>
>>> This bug impacts the search function quite significantly. I noticed when my 
>>> standard test scenario for search started to fail after my adjustments.
>>> The reason was that the search results for my test scenario significantly 
>>> increased and I had to adjust the expected results.
>>> The test scenario searches for "faith" in KJV. Previously (before the 
>>> bugfix) I expected 324 search results.
>>> After the bugfix/change mentioned below there are now 338 search results. 
>>> So you see that quite some verses are missed by the search function because 
>>> of this bug.
>>>
>>> Best regards,
>>> Tobias
>>> On 2/23/25 18:38, David Haslam wrote:
>>>> Excellent sleuthing, Tobias !
>>>>
>>>> Best regards,
>>>>
>>>> David
>>>> Sent with Proton Mail secure email.
>>>>
>>>> On Sunday, February 23rd, 2025 at 5:17 PM, Tobias Klein 
>>>> <contact at tklein.info> wrote:
>>>>
>>>>> Hi Troy,
>>>>> I have discovered the root cause of this bug.
>>>>> There is the following code in osisplain.cpp.
>>>>> I suppose the uppercasing action here has negative impact on the overall 
>>>>> parsing when the stripText() is running? else if (!strncmp(token, 
>>>>> "/divineName", 11)) {
>>>>> // Get the end portion of the string, and upper case it
>>>>> char* end = buf.getRawData();
>>>>> end += buf.size() - u->lastTextNode.size();
>>>>> toupperstr(end);
>>>>> }
>>>>> When I comment this portion out, the search bug does not occur anymore and 
>>>>> I get a correct result, see below.
>>>>>
>>>>> textBuf: For he said, Because the Lord hath sworn that the Lord will have 
>>>>> war with Amalek from generation to generation.
>>>>> term: generation to generation
>>>>> Got 11 results!
>>>>> Exod 17:16
>>>>> Isa 13:20
>>>>> Isa 34:10
>>>>> Isa 34:17
>>>>> Isa 51:8
>>>>> Jer 50:39
>>>>> Lam 5:19
>>>>> Dan 4:3
>>>>> Dan 4:34
>>>>> Joel 3:20
>>>>> Luke 1:50
>>>>> So, what the code stumbles over in the specific case of Exodus 17:16 is the 
>>>>> <divineName> tag and the parsing / actions related to it.
>>>>> Why is the uppercasing necessary at all in the code above? Shouldn't this 
>>>>> be left to the application software in terms of formatting the respective 
>>>>> element/tag in uppercase?
>>>>>
>>>>> Best regards,
>>>>> Tobias
>>>>>
>>>>> On 2/22/25 20:32, Tobias Klein wrote:
>>>>>> Hi Troy,
>>>>>> so I did a little debugging on this.
>>>>>> The respective portion of code in swmodule.cpp is this code below. I added 
>>>>>> some conditional print outs for Exodus 17:16 to see what happens there.
>>>>>> case SEARCHTYPE_PHRASE: {
>>>>>> textBuf = stripText();
>>>>>> if ((flags & REG_ICASE) == REG_ICASE) textBuf.toUpper();
>>>>>> SWKey* currentKey = getKey();
>>>>>> std::string referenceKey = "Exod 17:16";
>>>>>> if (currentKey->getShortText() == referenceKey) {
>>>>>> std::cout << "textBuf: " << textBuf.c_str() << std::endl;
>>>>>> std::cout << "term: " << term.c_str() << std::endl;
>>>>>> }
>>>>>> // TKL: This is where the actual search per verse happens
>>>>>> sres = strstr(textBuf.c_str(), term.c_str());
>>>>>> I get the following output based on my modification above:
>>>>>>
>>>>>> textBuf: For he said, Because the
>>>>>> term: generation to generation
>>>>>> The full verse content of Exodus 17:16 in KJV is this:
>>>>>> For he said, Because the Lord hath sworn that the Lord will have war with 
>>>>>> Amalek from generation to generation.
>>>>>>
>>>>>> So ... it seems that the stripText() call strips too much of the content 
>>>>>> (textBuf) of the verse away.
>>>>>> Based on that there is no way for the strstr call to succeed detecting the 
>>>>>> term "generation to generation", because at that point it is not part of 
>>>>>> the search string (textBuf) anymore.Could you do some investigation 
>>>>>> regarding the behavior of stripText here?
>>>>>>
>>>>>> Best regards,
>>>>>> TobiasOn 2/22/25 15:45, Tobias Klein wrote:
>>>>>>> Hi Troy,
>>>>>>>
>>>>>>> an Ezra Bible App user reported that the phrase search is not working as 
>>>>>>> expected.
>>>>>>>
>>>>>>> Here is an example where the results are not as expected.
>>>>>>>
>>>>>>> Module: KJV
>>>>>>>
>>>>>>> Search term: "generation to generation"
>>>>>>>
>>>>>>> I get the following results from the SWORD engine:
>>>>>>> Isa 13:20
>>>>>>> Isa 34:10
>>>>>>> Isa 34:17
>>>>>>> Isa 51:8
>>>>>>> Jer 50:39
>>>>>>> Dan 4:3
>>>>>>> Dan 4:34
>>>>>>> Joel 3:20
>>>>>>> Luke 1:50
>>>>>>>
>>>>>>> However, the verse Exodus 17:16 also contains this phrase, but is not in 
>>>>>>> the list of search results.
>>>>>>> Could it be related to the way how the markup is structured?
>>>>>>>
>>>>>>> In Exodus 17:16 [KJV], the markup of the respective phrase looks like this:
>>>>>>>
>>>>>>> <w class="strong:H01755">from generation</w> <w class="strong:H01755">to 
>>>>>>> generation</w>
>>>>>>>
>>>>>>> This is how I call the search function of the SWORD engine:
>>>>>>> listKey = module->search(searchTerm.c_str(), int(searchType), flags, scope, 
>>>>>>> 0, internalModuleSearchProgressCB);
>>>>>>> see 
>>>>>>> https://github.com/ezra-bible-app/node-sword-interface/blob/master/src/sword_backend/module_search.cpp#L178
>>>>>>>
>>>>>>> Have a nice weekend!
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Tobias
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>>>> http://crosswire.org/mailman/listinfo/sword-devel
>>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>>
>>>>>> _______________________________________________
>>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>>> http://crosswire.org/mailman/listinfo/sword-devel
>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>
>>>>
>>>> _______________________________________________
>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>> http://crosswire.org/mailman/listinfo/sword-devel
>>>> Instructions to unsubscribe/change your settings at above page
>>>
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20250303/c2ba0bec/attachment-0001.htm>


More information about the sword-devel mailing list