[sword-devel] Sword phrase search not returning all expected results

Tobias Klein contact at tklein.info
Sat Mar 1 04:09:01 EST 2025


Hi David,

when I perform a Lucene search for "faith" in the KJV with Xiphos 4.2.1 
on Linux I get 341 results.
When I perform an exact phrase search with the same environment I get 
338 results (exactly like in Ezra after the bugfix).

Best regards,
Tobias

On 3/1/25 09:45, David Haslam wrote:
> Hi Tobias,
>
> A Lucene search for 'faith' in the KJV module using *Xiphos* returns 
> 231 locations.
> _Aside_: Only 2 of these locations are in the OT !!!
>
> I'm sure there are verses that have the word repeated, as the *whole 
> word* occurs 247 times.
> (Search results from the plain text file output by *diatheke*)
>
> If one drops the *whole word* criterion, the total leaps to 362, as 
> words such as 'faithful', 'faithfulness', 'faithless', etc., are then 
> included.
>
> Best regards,
>
> David
>
> Sent with Proton Mail <https://proton.me/mail/home> secure email.
>
> On Saturday, March 1st, 2025 at 7:09 AM, Tobias Klein 
> <contact at tklein.info> wrote:
>>
>> Hi Troy,
>>
>> can this be fixed in SWORD?
>>
>> This bug impacts the search function quite significantly. I noticed 
>> when my standard test scenario for search started to fail after my 
>> adjustments.
>> The reason was that the search results for my test scenario 
>> significantly increased and I had to adjust the expected results.
>> The test scenario searches for "faith" in KJV. Previously (before the 
>> bugfix) I expected 324 search results.
>> After the bugfix/change mentioned below there are now 338 search 
>> results. So you see that quite some verses are missed by the search 
>> function because of this bug.
>>
>> Best regards,
>> Tobias
>>
>> On 2/23/25 18:38, David Haslam wrote:
>>> Excellent sleuthing, Tobias !
>>>
>>> Best regards,
>>>
>>> David
>>>
>>> Sent with Proton Mail <https://proton.me/mail/home> secure email.
>>>
>>> On Sunday, February 23rd, 2025 at 5:17 PM, Tobias Klein 
>>> <contact at tklein.info> wrote:
>>>>
>>>> Hi Troy,
>>>>
>>>> I have discovered the root cause of this bug.
>>>>
>>>> There is the following code in osisplain.cpp.
>>>> I suppose the uppercasing action here has negative impact on the 
>>>> overall parsing when the stripText() is running?
>>>>
>>>> elseif(!strncmp(token, "/divineName", 11)) {
>>>> // Get the end portion of the string, and upper case it
>>>> char*end=buf.getRawData();
>>>> end+=buf.size() -u->lastTextNode.size();
>>>> toupperstr(end);
>>>> }
>>>> When I comment this portion out, the search bug _does not occur 
>>>> anymore_ and I get a correct result, see below.
>>>>
>>>> textBuf: For he said, Because the Lord hath sworn that the Lord 
>>>> will have war with Amalek from generation to generation.
>>>> term: generation to generation
>>>> Got 11 results!
>>>> Exod 17:16
>>>> Isa 13:20
>>>> Isa 34:10
>>>> Isa 34:17
>>>> Isa 51:8
>>>> Jer 50:39
>>>> Lam 5:19
>>>> Dan 4:3
>>>> Dan 4:34
>>>> Joel 3:20
>>>> Luke 1:50
>>>>
>>>> So, what the code stumbles over in the specific case of Exodus 
>>>> 17:16 is the <divineName> tag and the parsing / actions related to it.
>>>> Why is the uppercasing necessary at all in the code above? 
>>>> Shouldn't this be left to the application software in terms of 
>>>> formatting the respective element/tag in uppercase?
>>>>
>>>> Best regards,
>>>> Tobias
>>>>
>>>> On 2/22/25 20:32, Tobias Klein wrote:
>>>>>
>>>>> Hi Troy,
>>>>>
>>>>> so I did a little debugging on this.
>>>>>
>>>>> The respective portion of code in swmodule.cpp is this code below. 
>>>>> I added some conditional print outs for Exodus 17:16 to see what 
>>>>> happens there.
>>>>>
>>>>> caseSEARCHTYPE_PHRASE: {
>>>>> textBuf=stripText();
>>>>> if((flags&REG_ICASE) ==REG_ICASE) textBuf.toUpper();
>>>>> SWKey*currentKey=getKey();
>>>>> std::stringreferenceKey="Exod 17:16";
>>>>> if(currentKey->getShortText() ==referenceKey) {
>>>>> std::cout<<"textBuf: "<<textBuf.c_str() <<std::endl;
>>>>> std::cout<<"term: "<<term.c_str() <<std::endl;
>>>>> }
>>>>> // TKL: This is where the actual search per verse happens
>>>>> sres=strstr(textBuf.c_str(), term.c_str());
>>>>>
>>>>> I get the following output based on my modification above:
>>>>>
>>>>> textBuf: For he said, Because the
>>>>> term: generation to generation
>>>>>
>>>>> The full verse content of Exodus 17:16 in KJV is this:
>>>>> For he said, Because the Lord hath sworn /that/ the Lord /will 
>>>>> have/ war with Amalek from generation to generation.
>>>>>
>>>>> So ... it seems that the stripText() call strips too much of the 
>>>>> content (textBuf) of the verse away.
>>>>> Based on that there is no way for the strstr call to succeed 
>>>>> detecting the term "generation to generation", because at that 
>>>>> point it is not part of the search string (textBuf) anymore.
>>>>>
>>>>> Could you do some investigation regarding the behavior of 
>>>>> stripText here?
>>>>>
>>>>> Best regards,
>>>>> Tobias
>>>>>
>>>>> On 2/22/25 15:45, Tobias Klein wrote:
>>>>>> Hi Troy,
>>>>>>
>>>>>> an Ezra Bible App user reported that the phrase search is not 
>>>>>> working as expected.
>>>>>>
>>>>>> Here is an example where the results are not as expected.
>>>>>>
>>>>>> Module: KJV
>>>>>>
>>>>>> Search term: "generation to generation"
>>>>>>
>>>>>> I get the following results from the SWORD engine:
>>>>>> Isa 13:20
>>>>>> Isa 34:10
>>>>>> Isa 34:17
>>>>>> Isa 51:8
>>>>>> Jer 50:39
>>>>>> Dan 4:3
>>>>>> Dan 4:34
>>>>>> Joel 3:20
>>>>>> Luke 1:50
>>>>>>
>>>>>> However, the verse Exodus 17:16 also contains this phrase, but is 
>>>>>> not in the list of search results.
>>>>>> Could it be related to the way how the markup is structured?
>>>>>>
>>>>>> In Exodus 17:16 [KJV], the markup of the respective phrase looks 
>>>>>> like this:
>>>>>>
>>>>>> <w class="strong:H01755">from generation</w> <w 
>>>>>> class="strong:H01755">to generation</w>
>>>>>>
>>>>>> This is how I call the search function of the SWORD engine:
>>>>>> listKey = module->search(searchTerm.c_str(), int(searchType), 
>>>>>> flags, scope, 0, internalModuleSearchProgressCB);
>>>>>> see 
>>>>>> https://github.com/ezra-bible-app/node-sword-interface/blob/master/src/sword_backend/module_search.cpp#L178
>>>>>>
>>>>>> Have a nice weekend!
>>>>>>
>>>>>> Best regards,
>>>>>> Tobias
>>>>>>
>>>>>> _______________________________________________
>>>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>>>> http://crosswire.org/mailman/listinfo/sword-devel
>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>
>>>>> _______________________________________________
>>>>> sword-devel mailing list:sword-devel at crosswire.org
>>>>> http://crosswire.org/mailman/listinfo/sword-devel
>>>>> Instructions to unsubscribe/change your settings at above page
>>>
>>>
>>> _______________________________________________
>>> sword-devel mailing list:sword-devel at crosswire.org
>>> http://crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>
>
> _______________________________________________
> sword-devel mailing list:sword-devel at crosswire.org
> http://crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20250301/b58943c0/attachment-0001.htm>


More information about the sword-devel mailing list