[jsword-devel] Search again (part 2 - Code design)

DM Smith dmsmith555 at yahoo.com
Tue Sep 28 14:52:27 MST 2004


This all sounds good.

I especially agree that JSword should have its own search syntax as a 
layer over the search engine.

I like your use of symbols for the syntax as that is language independent.

I was thinking that later we could add a pre-parse step that would allow 
for users to use natural language connectors which would be translated 
into the symbols.

W/ regard to the Thesaurus, it is a great idea and will be very useful. 
I don't think that it will be perfect and that it will result in some 
surprising answers.

I think that a thesaurus will be difficult to internationalize and to 
localize (both time and geography). For example, words in a KJV 
thesaurus may differ from a RSV thesaurus.

Joe Walker wrote:

>Hi,
>
>Given the above design, it strikes me that we want to have control
>over the search syntax. If we simply use a Lucene QueryParser to
>dictate our search syntax then we will find that we can't make G4356
>mean "strongs greek number 4356", we would need to use
>"strongs-greek:4356" because the : is an important part of the Lucene
>syntax.
>
>Also letting Lucene (or any other search engine) control our search
>syntax means swapping search engine affects all our users. If we do
>the parsing then it doesn't, and we have a query parser already so we
>can carry on  using it. This does not mean that we have to abandon
>Lucene, just that we use a FieldQuery rather than a QueryParser (in
>Lucene speak).
>
>Now this was the route that I was originally going down execpt that
>making our Matcher code work on top of Lucene is hard because one
>thing a search Index needs to be able to do (to support Matcher) is to
>find all words that start with some string. This is to allow stemming
>(to make Love, Loving, Loves, etc all match Lover, for example).
>However the real requirement for the Matcher is a kind of Thesaurus
>interface - "get me all the words that mean the same as X".
>
>So my current plan is to have a Thesaurus interface that we either
>implement by returning the word that we were sent alone, or if I can
>work out how to use Lucene more, by using more info from Lucene.
>
>Joe.
>_______________________________________________
>jsword-devel mailing list
>jsword-devel at crosswire.org
>http://www.crosswire.org/mailman/listinfo/jsword-devel
>
>  
>



More information about the jsword-devel mailing list