[sword-devel] Search Bug?

Patrick Stephan pstephan1187 at gmail.com
Wed Apr 19 11:32:07 EDT 2023


I am new to c/c++ and to this library, so I could be misunderstanding how the module search system works, but I think I may have found a bug?

Here is my code:
```
#include "vector"
#include "swmgr.h"
#include "swmodule.h"
#include "markupfiltmgr.h"
#include "modules.h"
#include "php_sword.h"

#if defined(USECXX11REGEX)
#include <regex>
#ifndef REG_ICASE
#define REG_ICASE std::regex::icase
#endif
#elif defined(USEICUREGEX)
#include <unicode/regex.h>
#ifndef REG_ICASE
#define REG_ICASE UREGEX_CASE_INSENSITIVE
#endif
#else
#include <regex.h>
#endif

using namespace::sword;

//  unrelated function here...

int sword_search_module(char* module_name, char* search_string) {
    SWMgr mgr(new MarkupFilterMgr(FMT_XHTML));

//  mgr.setGlobalOption("Headings", "On");
//  mgr.setGlobalOption("Strong's Numbers", "Off");
//  mgr.setGlobalOption("Lemmas", "Off");
//  mgr.setGlobalOption("Greek Accents", "Off");
//  mgr.setGlobalOption("Footnotes", "On");
//  mgr.setGlobalOption("Cross-references", "On");

    mgr.setGlobalOption("Headings", "Off");
    mgr.setGlobalOption("Strong's Numbers", "Off");
    mgr.setGlobalOption("Lemmas", "Off");
    mgr.setGlobalOption("Greek Accents", "Off");
    mgr.setGlobalOption("Footnotes", "Off");
    mgr.setGlobalOption("Cross-references", "Off");

    SWModule *module = mgr.getModule(module_name);

    /*
     * >=0 - regex; (for backward compat, if > 0 then used as additional REGEX FLAGS)
     * -1  - phrase
     * -2  - multiword
     * -3  - entryAttrib (eg. Word//Lemma./G1234/)   (Lemma with dot means check components (Lemma.[1-9]) also)
     * -4  - clucene
     * -5  - multilemma window; flags = window size
     */

    ListKey results = module->search(search_string, -2, REG_ICASE);

    for (int i = 0; i < results.getCount(); i++) {
        module->setKey(results.getElement());
        std::cout
            << results.getText()
            << " - "
            << module->renderText()
            << std::endl;
        results.increment();
    }

    return results.getCount();
}
```

Given that code, If I run a search for "so loved god" against the KJV module, I get 3 verses: Hosea 2:23, Hosea 3:1, and I Timothy 6:2. I'm not sure why I Tim 6:2 is listed because it contains niether "so", nor "god". There are also a slew of verses that are expected to show up but don't (like John 3:16), presumably because the case-insensitive flag `SEARCHFLAG_STRICTBOUNDARIES | REG_ICASE
` isn't working. BTW, I am compiling my code against Ubuntu 20.04 and `REG_ICASE` resolves to `2`.

Also, If I include the `SEARCHFLAG_STRICTBOUNDARIES` flag like so: `SEARCHFLAG_STRICTBOUNDARIES | REG_ICASE`, and search for 'god so loved', I get no results, further enforcing my theory that the `REG_ICASE` flag isn't doing anything. If I search for 'God so loved' then I get 6 results. I am, however expecting 10. Here are the verses that are not returned that I expect to:

• Nehemiah 13:26
• Galatians 2:20
• 2 Peter 1:17
• I John 4:10

I have also tried with replacing the if/else block at the top with just `#include "regex.h"`. But that makes no difference.
Thank you for any help

- Patrick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20230419/7fa733ee/attachment-0001.htm>


More information about the sword-devel mailing list