[sword-devel] indexed search discrepancy (and sword 1.6.0+dfsg-2)

Matthew Talbert ransom1982 at gmail.com
Sat Aug 29 20:08:28 MST 2009


>> Issue 2: search causes segfault when searching for stop words
>>      Resolution: set analyzer stop words to NULL for both index
>> creation and search. Possibly this would only have to be set for
>> search, and left on to lower the index size.
>
> The "possibly" worries me a bit :)  Do we need to test with and without
> the stopwords at index creation time, and see how much index size is
> affected?  Have you already done any testing along those lines?

OK, here are results. All tests are done with my previous changes; the
only difference is the first index has stop words, the second doesn't.

KJV 7.3MB 6.3MB
Finney 654KB 518KB
ESV 5.9MB 5.0MB

As with all things size related, modules with substantially more
markup (eg, KJV and ESV) experience the biggest difference in index
size. For the majority of non-Bible-text modules, there will hardly be
a measurable difference with the stop words added.

For me personally, it's worth the extra size to have them in, because
I can do a search for something like +the +lord to get all verses with
both words, whereas if these words aren't included in the index, you
would get all results for "lord" whether or not it contained "the". Or
for another example, with the stop words, I can do an indexed search
for "god is", which would return all verses where "is" followed "God"
directly. Without the stop words, this search does nothing.

For those wondering why a search for "the lord" doesn't segfault, it's
only when you search for a stop word alone that there is a segfault.
If you want to talk about confusing users, the current system would
seem illogical (I searched for "god is" and got nothing??).

Matthew



More information about the sword-devel mailing list