[sword-devel] Search Bug?

Patrick Stephan pstephan1187 at gmail.com
Tue Apr 25 09:16:26 EDT 2023


THAT WORKED! Oh my word, I've spent so much time digging through this. Thank you! The script still isn't detected the ICU lib correctly (it's possible I might be doing something wrong), but the case insensitive search is working now. Thank you so much!

- Patrick
On Apr 25, 2023 at 12:21 AM -0500, Troy A. Griffitts <scribe at crosswire.org>, wrote:
> Hi Stephan,
>
> Have a go with SWORD SVN.
>
> I wonder if ICU detection in trunk can handle the latest versions of libicu-dev on your Ubuntu box.
>
> svn co https://crosswire.org/svn/sword/trunk sword
>
> Thanks for looking into this.
>
> Also, you may wish to compile the examples/cmdline/search.cpp as is and see if you get the same results.
>
> Blessings,
>
> Troy
>
>
> > On April 24, 2023 9:58:48 PM MST, Patrick Stephan <pstephan1187 at gmail.com> wrote:
> > > If it helps... I am attempting to compile the library with ICU:
> > > ```
> > > RUN cd sword-1.9.0 && ./usrinst.sh --enable-shared --with-icu --with-icuregex
> > > ```
> > >
> > > However, it would appear that the compilation process is not detecting ICU correctly.
> > >
> > > Here are all the libraries being installed via apt-get on top of the base Ubuntu 22.04 image (I incorrectly mentioned before that it was Ubuntu 20.04):
> > >
> > > gnupg gosu curl ca-certificates zip unzip git
> > > supervisor sqlite3 libcap2-bin libpng-dev python2
> > > dnsutils subversion build-essential autotools-dev
> > > pkg-config libz-dev libclucene-dev libicu-dev
> > > libcurl4-gnutls-dev libtool m4 automake cmake zlib1g-dev
> > >
> > > And here is the output of the `./usrinst.sh --enable-shared --with-icu --with-icuregex` command:
> > >
> > > checking build system type... aarch64-unknown-linux-gnu
> > > checking host system type... aarch64-unknown-linux-gnu
> > > checking target system type... aarch64-unknown-linux-gnu
> > > checking for a BSD-compatible install... /usr/bin/install -c
> > > checking whether build environment is sane... yes
> > > checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
> > > checking for gawk... no
> > > checking for mawk... mawk
> > > checking whether make sets $(MAKE)... yes
> > > checking whether make supports nested variables... yes
> > > checking for gcc... gcc
> > > checking whether the C compiler works... yes
> > > checking for C compiler default output file name... a.out
> > > checking for suffix of executables...
> > > checking whether we are cross compiling... no
> > > checking for suffix of object files... o
> > > checking whether we are using the GNU C compiler... yes
> > > checking whether gcc accepts -g... yes
> > > checking for gcc option to accept ISO C89... none needed
> > > checking whether gcc understands -c and -o together... yes
> > > checking whether make supports the include directive... yes (GNU style)
> > > checking dependency style of gcc... gcc3
> > > checking for g++... g++
> > > checking whether we are using the GNU C++ compiler... yes
> > > checking whether g++ accepts -g... yes
> > > checking dependency style of g++... gcc3
> > > checking how to print strings... printf
> > > checking for a sed that does not truncate output... /usr/bin/sed
> > > checking for grep that handles long lines and -e... /usr/bin/grep
> > > checking for egrep... /usr/bin/grep -E
> > > checking for fgrep... /usr/bin/grep -F
> > > checking for ld used by gcc... /usr/bin/ld
> > > checking if the linker (/usr/bin/ld) is GNU ld... yes
> > > checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
> > > checking the name lister (/usr/bin/nm -B) interface... BSD nm
> > > checking whether ln -s works... yes
> > > checking the maximum length of command line arguments... 1572864
> > > checking how to convert aarch64-unknown-linux-gnu file names to aarch64-unknown-linux-gnu format... func_convert_file_noop
> > > checking how to convert aarch64-unknown-linux-gnu file names to toolchain format... func_convert_file_noop
> > > checking for /usr/bin/ld option to reload object files... -r
> > > checking for objdump... objdump
> > > checking how to recognize dependent libraries... pass_all
> > > checking for dlltool... dlltool
> > > checking how to associate runtime and link libraries... printf %s\n
> > > checking for ar... ar
> > > checking for archiver @FILE support... @
> > > checking for strip... strip
> > > checking for ranlib... ranlib
> > > checking command to parse /usr/bin/nm -B output from gcc object... ok
> > > checking for sysroot... no
> > > checking for a working dd... /usr/bin/dd
> > > checking how to truncate binary pipes... /usr/bin/dd bs=4096 count=1
> > > checking for mt... no
> > > checking if : is a manifest tool... no
> > > checking how to run the C preprocessor... gcc -E
> > > checking for ANSI C header files... yes
> > > checking for sys/types.h... yes
> > > checking for sys/stat.h... yes
> > > checking for stdlib.h... yes
> > > checking for string.h... yes
> > > checking for memory.h... yes
> > > checking for strings.h... yes
> > > checking for inttypes.h... yes
> > > checking for stdint.h... yes
> > > checking for unistd.h... yes
> > > checking for dlfcn.h... yes
> > > checking for objdir... .libs
> > > checking if gcc supports -fno-rtti -fno-exceptions... no
> > > checking for gcc option to produce PIC... -fPIC -DPIC
> > > checking if gcc PIC flag -fPIC -DPIC works... yes
> > > checking if gcc static flag -static works... yes
> > > checking if gcc supports -c -o file.o... yes
> > > checking if gcc supports -c -o file.o... (cached) yes
> > > checking whether the gcc linker (/usr/bin/ld) supports shared libraries... yes
> > > checking whether -lc should be explicitly linked in... no
> > > checking dynamic linker characteristics... GNU/Linux ld.so
> > > checking how to hardcode library paths into programs... immediate
> > > checking whether stripping libraries is possible... yes
> > > checking if libtool supports shared libraries... yes
> > > checking whether to build shared libraries... yes
> > > checking whether to build static libraries... yes
> > > checking how to run the C++ preprocessor... g++ -E
> > > checking for ld used by g++... /usr/bin/ld
> > > checking if the linker (/usr/bin/ld) is GNU ld... yes
> > > checking whether the g++ linker (/usr/bin/ld) supports shared libraries... yes
> > > checking for g++ option to produce PIC... -fPIC -DPIC
> > > checking if g++ PIC flag -fPIC -DPIC works... yes
> > > checking if g++ static flag -static works... yes
> > > checking if g++ supports -c -o file.o... yes
> > > checking if g++ supports -c -o file.o... (cached) yes
> > > checking whether the g++ linker (/usr/bin/ld) supports shared libraries... yes
> > > checking dynamic linker characteristics... (cached) GNU/Linux ld.so
> > > checking how to hardcode library paths into programs... immediate
> > > checking whether byte ordering is bigendian... no
> > > checking for pkg-config... /usr/bin/pkg-config
> > > checking pkg-config is at least version 0.9.0... yes
> > > checking for CLUCENE2... yes
> > > checking whether to enable maintainer-specific portions of Makefiles... no
> > > checking for compress in -lz... yes
> > > checking for library containing regexec... none required
> > > configure: Using system regex.h
> > > checking for cppunit-config... no
> > > checking for Cppunit - version >= 1.8.0... checking for pkg-config... (cached) /usr/bin/pkg-config
> > > checking pkg-config is at least version PKG_CONFIG... yes
> > > checking for icu-config... no
> > > *** The icu-config script installed by icu could not be found
> > > *** compiling without ICU support
> > > checking for curl-config... /usr/bin/curl-config
> > > curl found - remote install options available
> > > clucene 2.x found - lucene searching options available
> > > checking for main in -lxapian... no
> > > checking for vsnprintf... yes
> > > checking compiler warnings
> > > WARNING_CHECK: -Wno-address
> > > WARNING_CHECK: -Wno-nonnull-compare
> > > WARNINGS_OFF:  -Wno-address -Wno-nonnull-compare
> > > ./configure: line 18206: test: too many arguments
> > > checking that generated files are newer than configure... done
> > > configure: creating ./config.status
> > > config.status: creating Makefile
> > > config.status: creating lib/Makefile
> > > config.status: creating tests/Makefile
> > > config.status: creating tests/testsuite/Makefile
> > > config.status: creating tests/cppunit/Makefile
> > > config.status: creating utilities/Makefile
> > > config.status: creating examples/Makefile
> > > config.status: creating examples/cmdline/Makefile
> > > config.status: creating examples/tasks/Makefile
> > > config.status: creating utilities/diatheke/Makefile
> > > config.status: creating sword.pc
> > > config.status: creating include/swversion.h
> > > config.status: creating sword.spec
> > > config.status: creating include/config.h
> > > config.status: executing depfiles commands
> > > config.status: executing libtool commands
> > >
> > >
> > > Configuration:
> > >
> > >  Settings:
> > >      LIBDIR:               /usr/lib
> > >      DEBUG:                yes
> > >      PROFILE:              no
> > >      BUILD TESTS:          yes
> > >      BUILD EXAMPLES:       no
> > >      BUILD UTILITIES:      yes
> > >      STRIP LOG DEBUG:      no
> > >      STRIP LOG INFO:       no
> > >
> > >  Dependencies for standard use:
> > >      REGEX:                yes
> > >      ZLIB:                 yes
> > >      LIBICU:               no
> > >      LIBCURL:              yes
> > >      CLUCENE-CORE:         yes 2.x
> > >
> > >  Optional / Experimental:
> > >      LIBCURL SFTP:         yes
> > >      BZIP2:                no
> > >      XZ:                   no
> > >      ICUSWORD:             no
> > >      ICU-REGEX:            requested; but using ICU not enabled
> > >      CXX11-REGEX:          no
> > >      CXX11-TIME:           yes
> > >      XAPIAN-CORE:          no
> > >      GAPI:                 no
> > >
> > >
> > >
> > > Configured to NOT write a global /etc/sword.conf on 'make install'.
> > > If this is the first time you've installed sword, be sure to run
> > > 'make install_config' if you would like a basic configuration installed
> > >
> > > Next you might try something like:
> > >
> > > make
> > > sudo make install
> > > # (and optionally)
> > > sudo make install_config
> > > make register
> > >
> > > --------------------------------------------
> > >
> > > I think the issue has to do with the changes in the ICU package in more recent versions of Ubuntu, but It's getting too late for me to further investigate. I'll keep digging tomorrow unless someone with deeper knowledge replies with the answer.
> > >
> > > - Patrick
> > > On Apr 24, 2023 at 11:07 PM -0500, Patrick Stephan <pstephan1187 at gmail.com>, wrote:
> > > > So I've been digging into this a bit, and it would appear that the `toUpper()` method on the SWBuf class isn't working. I did a little hacking on the SWModule class and dropped in some debugging `printf` calls to see what was going on and why I wasn't getting my expected results. Here are I print out the word being queried:
> > > >
> > > > ```
> > > > printf("word check: %s\n", words[i].toUpper().c_str());
> > > > ```
> > > >
> > > > And here is the result:
> > > >
> > > > ```
> > > > word check: God
> > > > ```
> > > >
> > > > I'm doing this in between lines 771 and 772 in `swmodule.cpp`. I did the same type of thing on the `textBuf` in the search method and no casing is changed. Any thoughts on all this?
> > > >
> > > > I am wondering if it has to do with the more modern OS that I am using. I am attempting this on Ubuntu 20.04, and Sword 1.9.0 (which is the latest version best I can tell) is over 2 years old.
> > > >
> > > > That brings me to the next question: What is the development status on this? Is it being actively worked on? I would be willing to get involved, but I am only just now learning C/C++, So I would need some hand-holding, so-to-speak, to get me up to speed.
> > > >
> > > >
> > > > - Patrick
> > > > On Apr 19, 2023 at 10:32 AM -0500, Patrick Stephan <pstephan1187 at gmail.com>, wrote:
> > > > > I am new to c/c++ and to this library, so I could be misunderstanding how the module search system works, but I think I may have found a bug?
> > > > >
> > > > > Here is my code:
> > > > > ```
> > > > > #include "vector"
> > > > > #include "swmgr.h"
> > > > > #include "swmodule.h"
> > > > > #include "markupfiltmgr.h"
> > > > > #include "modules.h"
> > > > > #include "php_sword.h"
> > > > >
> > > > > #if defined(USECXX11REGEX)
> > > > > #include <regex>
> > > > > #ifndef REG_ICASE
> > > > > #define REG_ICASE std::regex::icase
> > > > > #endif
> > > > > #elif defined(USEICUREGEX)
> > > > > #include <unicode/regex.h>
> > > > > #ifndef REG_ICASE
> > > > > #define REG_ICASE UREGEX_CASE_INSENSITIVE
> > > > > #endif
> > > > > #else
> > > > > #include <regex.h>
> > > > > #endif
> > > > >
> > > > > using namespace::sword;
> > > > >
> > > > > //  unrelated function here...
> > > > >
> > > > > int sword_search_module(char* module_name, char* search_string) {
> > > > >     SWMgr mgr(new MarkupFilterMgr(FMT_XHTML));
> > > > >
> > > > > //  mgr.setGlobalOption("Headings", "On");
> > > > > //  mgr.setGlobalOption("Strong's Numbers", "Off");
> > > > > //  mgr.setGlobalOption("Lemmas", "Off");
> > > > > //  mgr.setGlobalOption("Greek Accents", "Off");
> > > > > //  mgr.setGlobalOption("Footnotes", "On");
> > > > > //  mgr.setGlobalOption("Cross-references", "On");
> > > > >
> > > > >     mgr.setGlobalOption("Headings", "Off");
> > > > >     mgr.setGlobalOption("Strong's Numbers", "Off");
> > > > >     mgr.setGlobalOption("Lemmas", "Off");
> > > > >     mgr.setGlobalOption("Greek Accents", "Off");
> > > > >     mgr.setGlobalOption("Footnotes", "Off");
> > > > >     mgr.setGlobalOption("Cross-references", "Off");
> > > > >
> > > > >     SWModule *module = mgr.getModule(module_name);
> > > > >
> > > > >     /*
> > > > >      * >=0 - regex; (for backward compat, if > 0 then used as additional REGEX FLAGS)
> > > > >      * -1  - phrase
> > > > >      * -2  - multiword
> > > > >      * -3  - entryAttrib (eg. Word//Lemma./G1234/)   (Lemma with dot means check components (Lemma.[1-9]) also)
> > > > >      * -4  - clucene
> > > > >      * -5  - multilemma window; flags = window size
> > > > >      */
> > > > >
> > > > >     ListKey results = module->search(search_string, -2, REG_ICASE);
> > > > >
> > > > >     for (int i = 0; i < results.getCount(); i++) {
> > > > >         module->setKey(results.getElement());
> > > > >         std::cout
> > > > >             << results.getText()
> > > > >             << " - "
> > > > >             << module->renderText()
> > > > >             << std::endl;
> > > > >         results.increment();
> > > > >     }
> > > > >
> > > > >     return results.getCount();
> > > > > }
> > > > > ```
> > > > >
> > > > > Given that code, If I run a search for "so loved god" against the KJV module, I get 3 verses: Hosea 2:23, Hosea 3:1, and I Timothy 6:2. I'm not sure why I Tim 6:2 is listed because it contains niether "so", nor "god". There are also a slew of verses that are expected to show up but don't (like John 3:16), presumably because the case-insensitive flag `SEARCHFLAG_STRICTBOUNDARIES | REG_ICASE
> > > > > ` isn't working. BTW, I am compiling my code against Ubuntu 20.04 and `REG_ICASE` resolves to `2`.
> > > > >
> > > > > Also, If I include the `SEARCHFLAG_STRICTBOUNDARIES` flag like so: `SEARCHFLAG_STRICTBOUNDARIES | REG_ICASE`, and search for 'god so loved', I get no results, further enforcing my theory that the `REG_ICASE` flag isn't doing anything. If I search for 'God so loved' then I get 6 results. I am, however expecting 10. Here are the verses that are not returned that I expect to:
> > > > >
> > > > > • Nehemiah 13:26
> > > > > • Galatians 2:20
> > > > > • 2 Peter 1:17
> > > > > • I John 4:10
> > > > >
> > > > > I have also tried with replacing the if/else block at the top with just `#include "regex.h"`. But that makes no difference.
> > > > > Thank you for any help
> > > > >
> > > > > - Patrick
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20230425/2d0a16ae/attachment-0001.htm>


More information about the sword-devel mailing list