[jsword-devel] Searching in German

Jonathan Morgan jonmmorgan at gmail.com
Fri Feb 8 06:02:03 MST 2008


On Feb 7, 2008 1:59 AM, Manfred Bergmann <bergmannmd at yahoo.de> wrote:
>
> Am 06.02.2008 um 15:34 schrieb DM Smith:
>
> > Manfred Bergmann wrote:
> >> Hi DM.
> >>
> >> AFAIK, Lucene from version 2.1 can deal with leading wildcards.
> >>
> > Lucene 2.3 still throws an error: "Cannot parse '*ning': '*' or '?'
> > not
> > allowed as first character in WildcardQuery"
> >
> > Perhaps, there's something that needs to be done to enable it?
>
> Yes, there is a flag in QueryParser class called
> setAllowLeadingWindcard(boolean).
> It is available since v2.1.
>
> >
> >
> >> It would find "*schiff".
> >> Wouldn't this be enough?
> >>
> > If a German speaker searches for "schiff" would they expect to find in
> > words like donaudampfschiff?
>
> They probably would expect to find it.
> But sometimes you find search options like "exact search phrase". Then
> it should find "schiff".
> Else I would expect that the search engine adds wildcards in front and
> behind so that any words are found containing this token.

For what it's worth, I believe that exact search should be the
default, for two main reasons:
1. Non-exact search has (in my opinion) greater potential to surprise
the user than exact search.  For example, if I search for thirst in
the Bible, then (depending on my version) I will get results such as
thirst, thirsts, and thirsted (all of which I may want) but I will
also get bloodthirsty.  If I search for need, then I will get need,
needs, needed, needy, etc., but I will also get needlework.  I have
had quite a few searches (which I can't remember offhand now) where
non-exact search found me a few extra verses I wanted, but had a
greater than 70% false positive rate, and as a user I don't expect
behaviour like this to be the default.  What I actually really want is
a way to search for words need and all its derivatives without
including every word with need in it (does stemming or something
similar support this, and if so, does BD include it?)  As a user, if I
search for a thing and notice that it isn't coming up with exact
search, then I can easily switch to non-exact search.  However, if
non-exact search is the default then there is a greater chance of me
being drowned with results that I don't want.  The aim of a search
should probably be to get the minimum useful set of results.

2. Performance: I don't know what (if any) performance impact results
from using a wildcard search of form *term*, but if it significantly
affects the performance on the search then it probably shouldn't be
the default.

Jon



More information about the jsword-devel mailing list