[sword-devel] Sword phrase search not returning all expected results
Tobias Klein
contact at tklein.info
Sat Mar 1 02:09:22 EST 2025
Hi Troy,
can this be fixed in SWORD?
This bug impacts the search function quite significantly. I noticed when
my standard test scenario for search started to fail after my adjustments.
The reason was that the search results for my test scenario
significantly increased and I had to adjust the expected results.
The test scenario searches for "faith" in KJV. Previously (before the
bugfix) I expected 324 search results.
After the bugfix/change mentioned below there are now 338 search
results. So you see that quite some verses are missed by the search
function because of this bug.
Best regards,
Tobias
On 2/23/25 18:38, David Haslam wrote:
> Excellent sleuthing, Tobias !
>
> Best regards,
>
> David
>
> Sent with Proton Mail <https://proton.me/mail/home> secure email.
>
> On Sunday, February 23rd, 2025 at 5:17 PM, Tobias Klein
> <contact at tklein.info> wrote:
>>
>> Hi Troy,
>>
>> I have discovered the root cause of this bug.
>>
>> There is the following code in osisplain.cpp.
>> I suppose the uppercasing action here has negative impact on the
>> overall parsing when the stripText() is running?
>>
>> elseif(!strncmp(token, "/divineName", 11)) {
>> // Get the end portion of the string, and upper case it
>> char*end=buf.getRawData();
>> end+=buf.size() -u->lastTextNode.size();
>> toupperstr(end);
>> }
>> When I comment this portion out, the search bug _does not occur
>> anymore_ and I get a correct result, see below.
>>
>> textBuf: For he said, Because the Lord hath sworn that the Lord will
>> have war with Amalek from generation to generation.
>> term: generation to generation
>> Got 11 results!
>> Exod 17:16
>> Isa 13:20
>> Isa 34:10
>> Isa 34:17
>> Isa 51:8
>> Jer 50:39
>> Lam 5:19
>> Dan 4:3
>> Dan 4:34
>> Joel 3:20
>> Luke 1:50
>>
>> So, what the code stumbles over in the specific case of Exodus 17:16
>> is the <divineName> tag and the parsing / actions related to it.
>> Why is the uppercasing necessary at all in the code above? Shouldn't
>> this be left to the application software in terms of formatting the
>> respective element/tag in uppercase?
>>
>> Best regards,
>> Tobias
>>
>> On 2/22/25 20:32, Tobias Klein wrote:
>>>
>>> Hi Troy,
>>>
>>> so I did a little debugging on this.
>>>
>>> The respective portion of code in swmodule.cpp is this code below. I
>>> added some conditional print outs for Exodus 17:16 to see what
>>> happens there.
>>>
>>> caseSEARCHTYPE_PHRASE: {
>>> textBuf=stripText();
>>> if((flags®_ICASE) ==REG_ICASE) textBuf.toUpper();
>>> SWKey*currentKey=getKey();
>>> std::stringreferenceKey="Exod 17:16";
>>> if(currentKey->getShortText() ==referenceKey) {
>>> std::cout<<"textBuf: "<<textBuf.c_str() <<std::endl;
>>> std::cout<<"term: "<<term.c_str() <<std::endl;
>>> }
>>> // TKL: This is where the actual search per verse happens
>>> sres=strstr(textBuf.c_str(), term.c_str());
>>>
>>> I get the following output based on my modification above:
>>>
>>> textBuf: For he said, Because the
>>> term: generation to generation
>>>
>>> The full verse content of Exodus 17:16 in KJV is this:
>>> For he said, Because the Lord hath sworn /that/ the Lord /will have/
>>> war with Amalek from generation to generation.
>>>
>>> So ... it seems that the stripText() call strips too much of the
>>> content (textBuf) of the verse away.
>>> Based on that there is no way for the strstr call to succeed
>>> detecting the term "generation to generation", because at that point
>>> it is not part of the search string (textBuf) anymore.
>>>
>>> Could you do some investigation regarding the behavior of stripText
>>> here?
>>>
>>> Best regards,
>>> Tobias
>>>
>>> On 2/22/25 15:45, Tobias Klein wrote:
>>>> Hi Troy,
>>>>
>>>> an Ezra Bible App user reported that the phrase search is not
>>>> working as expected.
>>>>
>>>> Here is an example where the results are not as expected.
>>>>
>>>> Module: KJV
>>>>
>>>> Search term: "generation to generation"
>>>>
>>>> I get the following results from the SWORD engine:
>>>> Isa 13:20
>>>> Isa 34:10
>>>> Isa 34:17
>>>> Isa 51:8
>>>> Jer 50:39
>>>> Dan 4:3
>>>> Dan 4:34
>>>> Joel 3:20
>>>> Luke 1:50
>>>>
>>>> However, the verse Exodus 17:16 also contains this phrase, but is
>>>> not in the list of search results.
>>>> Could it be related to the way how the markup is structured?
>>>>
>>>> In Exodus 17:16 [KJV], the markup of the respective phrase looks
>>>> like this:
>>>>
>>>> <w class="strong:H01755">from generation</w> <w
>>>> class="strong:H01755">to generation</w>
>>>>
>>>> This is how I call the search function of the SWORD engine:
>>>> listKey = module->search(searchTerm.c_str(), int(searchType),
>>>> flags, scope, 0, internalModuleSearchProgressCB);
>>>> see
>>>> https://github.com/ezra-bible-app/node-sword-interface/blob/master/src/sword_backend/module_search.cpp#L178
>>>>
>>>> Have a nice weekend!
>>>>
>>>> Best regards,
>>>> Tobias
>>>>
>>>> _______________________________________________
>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>> http://crosswire.org/mailman/listinfo/sword-devel
>>>> Instructions to unsubscribe/change your settings at above page
>>>
>>> _______________________________________________
>>> sword-devel mailing list:sword-devel at crosswire.org
>>> http://crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>
>
> _______________________________________________
> sword-devel mailing list:sword-devel at crosswire.org
> http://crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20250301/1d1a122e/attachment-0001.htm>
More information about the sword-devel
mailing list