[bt-devel] change in search algo
Martin Gruner
mg.pub at gmx.net
Sat Oct 21 12:48:05 MST 2006
Hi friends,
today I changed BibleTime's (CVS) search implementation from using the
StandardAnalyzer to using the WhitespaceAnalyzer. The difference is that the
StandardAnalyzer applies a set of default English stop words to the text
being indexed and the queries. That means words like "the", "they" and "then"
were not found, because they are assumed to produce too many results. Within
BibleTime, this seems not acceptable to me, so I changed it. The new analyzer
just splits the query into words according to the whitespace. Everything will
be indexed and can be queried. This means the index will be slightly bigger,
but everything can be found.
Is this ok, or would somebody disagree? Please let me know.
mg
P.S. I also improved our own search highlighting a bit to handle "*" more
correctly. The best solution, however, would be to use clucene for that as
well...
More information about the bt-devel
mailing list