[sword-devel] Search Bug?

Patrick Stephan pstephan1187 at gmail.com
Tue Apr 25 00:58:48 EDT 2023


If it helps... I am attempting to compile the library with ICU:
```
RUN cd sword-1.9.0 && ./usrinst.sh --enable-shared --with-icu --with-icuregex
```

However, it would appear that the compilation process is not detecting ICU correctly.

Here are all the libraries being installed via apt-get on top of the base Ubuntu 22.04 image (I incorrectly mentioned before that it was Ubuntu 20.04):

gnupg gosu curl ca-certificates zip unzip git
supervisor sqlite3 libcap2-bin libpng-dev python2
dnsutils subversion build-essential autotools-dev
pkg-config libz-dev libclucene-dev libicu-dev
libcurl4-gnutls-dev libtool m4 automake cmake zlib1g-dev

And here is the output of the `./usrinst.sh --enable-shared --with-icu --with-icuregex` command:

checking build system type... aarch64-unknown-linux-gnu
checking host system type... aarch64-unknown-linux-gnu
checking target system type... aarch64-unknown-linux-gnu
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... no
checking for mawk... mawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether gcc understands -c and -o together... yes
checking whether make supports the include directive... yes (GNU style)
checking dependency style of gcc... gcc3
checking for g++... g++
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking dependency style of g++... gcc3
checking how to print strings... printf
checking for a sed that does not truncate output... /usr/bin/sed
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for fgrep... /usr/bin/grep -F
checking for ld used by gcc... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
checking the name lister (/usr/bin/nm -B) interface... BSD nm
checking whether ln -s works... yes
checking the maximum length of command line arguments... 1572864
checking how to convert aarch64-unknown-linux-gnu file names to aarch64-unknown-linux-gnu format... func_convert_file_noop
checking how to convert aarch64-unknown-linux-gnu file names to toolchain format... func_convert_file_noop
checking for /usr/bin/ld option to reload object files... -r
checking for objdump... objdump
checking how to recognize dependent libraries... pass_all
checking for dlltool... dlltool
checking how to associate runtime and link libraries... printf %s\n
checking for ar... ar
checking for archiver @FILE support... @
checking for strip... strip
checking for ranlib... ranlib
checking command to parse /usr/bin/nm -B output from gcc object... ok
checking for sysroot... no
checking for a working dd... /usr/bin/dd
checking how to truncate binary pipes... /usr/bin/dd bs=4096 count=1
checking for mt... no
checking if : is a manifest tool... no
checking how to run the C preprocessor... gcc -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for dlfcn.h... yes
checking for objdir... .libs
checking if gcc supports -fno-rtti -fno-exceptions... no
checking for gcc option to produce PIC... -fPIC -DPIC
checking if gcc PIC flag -fPIC -DPIC works... yes
checking if gcc static flag -static works... yes
checking if gcc supports -c -o file.o... yes
checking if gcc supports -c -o file.o... (cached) yes
checking whether the gcc linker (/usr/bin/ld) supports shared libraries... yes
checking whether -lc should be explicitly linked in... no
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... yes
checking how to run the C++ preprocessor... g++ -E
checking for ld used by g++... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking whether the g++ linker (/usr/bin/ld) supports shared libraries... yes
checking for g++ option to produce PIC... -fPIC -DPIC
checking if g++ PIC flag -fPIC -DPIC works... yes
checking if g++ static flag -static works... yes
checking if g++ supports -c -o file.o... yes
checking if g++ supports -c -o file.o... (cached) yes
checking whether the g++ linker (/usr/bin/ld) supports shared libraries... yes
checking dynamic linker characteristics... (cached) GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether byte ordering is bigendian... no
checking for pkg-config... /usr/bin/pkg-config
checking pkg-config is at least version 0.9.0... yes
checking for CLUCENE2... yes
checking whether to enable maintainer-specific portions of Makefiles... no
checking for compress in -lz... yes
checking for library containing regexec... none required
configure: Using system regex.h
checking for cppunit-config... no
checking for Cppunit - version >= 1.8.0... checking for pkg-config... (cached) /usr/bin/pkg-config
checking pkg-config is at least version PKG_CONFIG... yes
checking for icu-config... no
*** The icu-config script installed by icu could not be found
*** compiling without ICU support
checking for curl-config... /usr/bin/curl-config
curl found - remote install options available
clucene 2.x found - lucene searching options available
checking for main in -lxapian... no
checking for vsnprintf... yes
checking compiler warnings
WARNING_CHECK: -Wno-address
WARNING_CHECK: -Wno-nonnull-compare
WARNINGS_OFF:  -Wno-address -Wno-nonnull-compare
./configure: line 18206: test: too many arguments
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: creating lib/Makefile
config.status: creating tests/Makefile
config.status: creating tests/testsuite/Makefile
config.status: creating tests/cppunit/Makefile
config.status: creating utilities/Makefile
config.status: creating examples/Makefile
config.status: creating examples/cmdline/Makefile
config.status: creating examples/tasks/Makefile
config.status: creating utilities/diatheke/Makefile
config.status: creating sword.pc
config.status: creating include/swversion.h
config.status: creating sword.spec
config.status: creating include/config.h
config.status: executing depfiles commands
config.status: executing libtool commands


Configuration:

 Settings:
     LIBDIR:               /usr/lib
     DEBUG:                yes
     PROFILE:              no
     BUILD TESTS:          yes
     BUILD EXAMPLES:       no
     BUILD UTILITIES:      yes
     STRIP LOG DEBUG:      no
     STRIP LOG INFO:       no

 Dependencies for standard use:
     REGEX:                yes
     ZLIB:                 yes
     LIBICU:               no
     LIBCURL:              yes
     CLUCENE-CORE:         yes 2.x

 Optional / Experimental:
     LIBCURL SFTP:         yes
     BZIP2:                no
     XZ:                   no
     ICUSWORD:             no
     ICU-REGEX:            requested; but using ICU not enabled
     CXX11-REGEX:          no
     CXX11-TIME:           yes
     XAPIAN-CORE:          no
     GAPI:                 no



Configured to NOT write a global /etc/sword.conf on 'make install'.
If this is the first time you've installed sword, be sure to run
'make install_config' if you would like a basic configuration installed

Next you might try something like:

make
sudo make install
# (and optionally)
sudo make install_config
make register

--------------------------------------------

I think the issue has to do with the changes in the ICU package in more recent versions of Ubuntu, but It's getting too late for me to further investigate. I'll keep digging tomorrow unless someone with deeper knowledge replies with the answer.

- Patrick
On Apr 24, 2023 at 11:07 PM -0500, Patrick Stephan <pstephan1187 at gmail.com>, wrote:
> So I've been digging into this a bit, and it would appear that the `toUpper()` method on the SWBuf class isn't working. I did a little hacking on the SWModule class and dropped in some debugging `printf` calls to see what was going on and why I wasn't getting my expected results. Here are I print out the word being queried:
>
> ```
> printf("word check: %s\n", words[i].toUpper().c_str());
> ```
>
> And here is the result:
>
> ```
> word check: God
> ```
>
> I'm doing this in between lines 771 and 772 in `swmodule.cpp`. I did the same type of thing on the `textBuf` in the search method and no casing is changed. Any thoughts on all this?
>
> I am wondering if it has to do with the more modern OS that I am using. I am attempting this on Ubuntu 20.04, and Sword 1.9.0 (which is the latest version best I can tell) is over 2 years old.
>
> That brings me to the next question: What is the development status on this? Is it being actively worked on? I would be willing to get involved, but I am only just now learning C/C++, So I would need some hand-holding, so-to-speak, to get me up to speed.
>
>
> - Patrick
> On Apr 19, 2023 at 10:32 AM -0500, Patrick Stephan <pstephan1187 at gmail.com>, wrote:
> > I am new to c/c++ and to this library, so I could be misunderstanding how the module search system works, but I think I may have found a bug?
> >
> > Here is my code:
> > ```
> > #include "vector"
> > #include "swmgr.h"
> > #include "swmodule.h"
> > #include "markupfiltmgr.h"
> > #include "modules.h"
> > #include "php_sword.h"
> >
> > #if defined(USECXX11REGEX)
> > #include <regex>
> > #ifndef REG_ICASE
> > #define REG_ICASE std::regex::icase
> > #endif
> > #elif defined(USEICUREGEX)
> > #include <unicode/regex.h>
> > #ifndef REG_ICASE
> > #define REG_ICASE UREGEX_CASE_INSENSITIVE
> > #endif
> > #else
> > #include <regex.h>
> > #endif
> >
> > using namespace::sword;
> >
> > //  unrelated function here...
> >
> > int sword_search_module(char* module_name, char* search_string) {
> >     SWMgr mgr(new MarkupFilterMgr(FMT_XHTML));
> >
> > //  mgr.setGlobalOption("Headings", "On");
> > //  mgr.setGlobalOption("Strong's Numbers", "Off");
> > //  mgr.setGlobalOption("Lemmas", "Off");
> > //  mgr.setGlobalOption("Greek Accents", "Off");
> > //  mgr.setGlobalOption("Footnotes", "On");
> > //  mgr.setGlobalOption("Cross-references", "On");
> >
> >     mgr.setGlobalOption("Headings", "Off");
> >     mgr.setGlobalOption("Strong's Numbers", "Off");
> >     mgr.setGlobalOption("Lemmas", "Off");
> >     mgr.setGlobalOption("Greek Accents", "Off");
> >     mgr.setGlobalOption("Footnotes", "Off");
> >     mgr.setGlobalOption("Cross-references", "Off");
> >
> >     SWModule *module = mgr.getModule(module_name);
> >
> >     /*
> >      * >=0 - regex; (for backward compat, if > 0 then used as additional REGEX FLAGS)
> >      * -1  - phrase
> >      * -2  - multiword
> >      * -3  - entryAttrib (eg. Word//Lemma./G1234/)   (Lemma with dot means check components (Lemma.[1-9]) also)
> >      * -4  - clucene
> >      * -5  - multilemma window; flags = window size
> >      */
> >
> >     ListKey results = module->search(search_string, -2, REG_ICASE);
> >
> >     for (int i = 0; i < results.getCount(); i++) {
> >         module->setKey(results.getElement());
> >         std::cout
> >             << results.getText()
> >             << " - "
> >             << module->renderText()
> >             << std::endl;
> >         results.increment();
> >     }
> >
> >     return results.getCount();
> > }
> > ```
> >
> > Given that code, If I run a search for "so loved god" against the KJV module, I get 3 verses: Hosea 2:23, Hosea 3:1, and I Timothy 6:2. I'm not sure why I Tim 6:2 is listed because it contains niether "so", nor "god". There are also a slew of verses that are expected to show up but don't (like John 3:16), presumably because the case-insensitive flag `SEARCHFLAG_STRICTBOUNDARIES | REG_ICASE
> > ` isn't working. BTW, I am compiling my code against Ubuntu 20.04 and `REG_ICASE` resolves to `2`.
> >
> > Also, If I include the `SEARCHFLAG_STRICTBOUNDARIES` flag like so: `SEARCHFLAG_STRICTBOUNDARIES | REG_ICASE`, and search for 'god so loved', I get no results, further enforcing my theory that the `REG_ICASE` flag isn't doing anything. If I search for 'God so loved' then I get 6 results. I am, however expecting 10. Here are the verses that are not returned that I expect to:
> >
> > • Nehemiah 13:26
> > • Galatians 2:20
> > • 2 Peter 1:17
> > • I John 4:10
> >
> > I have also tried with replacing the if/else block at the top with just `#include "regex.h"`. But that makes no difference.
> > Thank you for any help
> >
> > - Patrick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20230424/ec7af599/attachment-0001.htm>


More information about the sword-devel mailing list