[sword-devel] search failing in Hebrew modules
Troy A. Griffitts
scribe at crosswire.org
Tue Aug 4 04:16:55 MST 2009
Guys,
Sorry for not being on top of this sooner. OK, let's hammer this out.
Karl, thanks for the data, that's great. This is what I'm planning to
do when I actually wake up in a few hours:
add a new tests/striptest.cpp
SWMgr library;
SWModule *book = library.getModule(argv[2]);
StringList filters = library.getGlobalOptions;
for (StringList::iterator it = filters.begin(); it != filters.end() ++it) {
// blindly turn off all filters. Some filters don't support "Off", but
that's ok, we should just silently fail on those.
library.setGlobalOption(*it, "Off");
}
SWBuf entryStripped = book->StripText();
book->setKey(argv[3]);
cout << "RawEntry:\n" << book->getRawEntry() << "\n";
cout << "StripText:\n" << entryStripped << "\n";
cout << "Search Target: " << argv[4] << "\n";
cout << "Search Target StripText: " << book->StripText(argv[4]) << "\n";
cout << "Found: " << ((strstr(entryStripped.c_str(),
book->StripText(argv[4]).c_str())) ? "true":"false") << endl;
and we'll try it with Karl's example data:
./striptest WLC Gen.1.9 "מתחת"
and send it to a hex display if necessary, and see what we're missing.
I'm guessing the root of the problem is in our UTF8HebrewPoints filter
missing something, or possibly, if this test outputs "found: true" then
it might be our case folding code.
Anyway, if someone beats me to it and tries the above test before I wake
up, let me know the results.
Again, sorry for not being more responsive the last couple days with
this. This is something we really need to iron out for Hebrew and other
languages as well. Thanks for pushing on this issue.
-Troy.
Karl Kleinpaste wrote:
> "Troy A. Griffitts" <scribe at crosswire.org> writes:
>> Anyone willing to put the time into investigating if proper UTF-8 is
>> being sent into the SWORD engine from the copy and paste from Xiphos?
>
> I'll need some help here, converting octal crud from gdb to what folks
> think should be the Hebrew.
>
> My example search is:
> - Xiphos in up-to-date F11
> - Sword at -r2437
> - WLC 1.6
> - no CLucene index
> - plain ol' multiword search
> (sidebar search defaults to "indexed," with fallback to multiword in
> absence of index)
> - search scope limited to Genesis
> - copying/pasting word #5 from Gen 1:9, "מתחת"
> (again, XEmacs is not entirely happy w/Hebrew, so I hope that appears
> properly to the rest of you)
>
> With vowel points off, stepping through Xiphos' acquisition of the text
> from the input box, search_string is:
>
> $1 = 0x973f878 "\327\236\327\252\327\227\327\252"
>
> search_string is untouched down into the Sword search call. No results.
>
> Turning vowel points on, but searching on the same un-vowel-pointed
> string changes nothing, I get no results. (No surprise, but I'm trying
> to be exhaustive.)
>
> Re-pasting the now-vowel-pointed word for search, search_string is:
>
> $7 = 0xb9c82e8 "\327\236\326\264\327\252\326\274\326\267\327\227\326\267\327\252"
>
> And again, no results.
>
> Matthew says he got results in the vowel points = "on" case, but I
> don't. The only difference I know between us is that I use Fedora and
> he uses Ubuntu, so there is perhaps some version skew on other linked
> libraries, but there is no other library in between Xiphos and Sword's
> search, so I can't explain how we get different results, when he does
> multiword, non-CLucene searches.
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
More information about the sword-devel
mailing list