[jsword-devel] Re: Search and its bugs

Joe Walker joseph.walker at gmail.com
Fri Apr 8 16:02:46 MST 2005


Having a SearchSyntax sounds like a good idea to me.

It would be good if we could implement it using Lucene, we've talked about 
using their query parser in the past.

The problems of the search query parser probably come down to the way it has 
evolved, which seems to be a common pit-fall for any parser code - the 
pattern seems to be that the parser evolves to the point where squashing 
bugs becomes too regular and then someone sits down and writes a grammar for 
it. I noticed that Groovy has just been through this.
I've dabbled with javacc successfully on a couple of projects, and once 
tried to write a COBOL grammar - very unsuccessfully so I know it can be 
hard. This may well be overkill for our simple syntax?

Other than that, go for it!

Joe.


On Apr 8, 2005 12:52 PM, DM Smith <dmsmith555 at gmail.com> wrote:
> 
> I've narrowed down some of the bugs of search. Seems that the tokenizer
> is not producing the correct stream of tokens.
> Specifically, the algorithm using the tokens goes something like this:
> 
> while there are command tokens at the beginning of the stream get next one
> do
> have that command consume word tokens until it reaches a terminating
> condition
> done
> 
> The problem of +[mat-rev]"bread of life" is that this produces a token
> stream where +[mat-rev] is not followed by a command token.
> 
> In looking at this I noticed that there is what looks like a design
> problem. Consistently, elsewhere in JSword, an interface defines a wall
> that BibleDesktop and JSword does not look behind. However in the case
> of searching this is not the case.
> 
> jsword.book.search
> provides the interfaces for Search and Index and factories to get
> implementation
> jsword.book.search.basic
> provides abstract/partial implementation of the interfaces
> jsword.book.search.parse
> provides an implementation of Searcher
> jsword.book.search.lucene
> provides an implementation of Indexer
> 
> Based upon this I would have expected that no code (outside of the
> package) would have directly used jsword.book.search.parse code.
> 
> The reason I noticed this was that I wanted to create another searcher
> and get it from the search factory. (Start with a copy and fix bugs,
> while retaining the ability to use BibleDesktop by changing the
> factories properties.)
> 
> What is being used is the syntax elements to pro grammatically construct
> a search. I'm thinking that we need YAI (yet another interface) for
> SearchSyntax. This would be able to:
> 1) decorate individual words and phrases with appropriate syntax elements.
> SearchSyntax ss = SearchSyntaxFactory.getSearchSyntax();
> String decorated = ss.decorate(SyntaxType.STARTS_WITH, "bread of life");
> decorated = ss.decorate(SyntaxType.FIND_ALL_WORDS, "son of man");
> decorated = ss.decorate(SyntaxType.FIND_STRONG_NUMBERS, "1234 5678");
> decorated = ss.decorate(SyntaxType.BEST_MATCH, "....");
> decorated = ss.decorate(SyntaxType.PHRASE_SEARCH, "....");
> ...
> 
> 2) create a token stream from a string.
> Token[] tokens = ss.tokenize("search string");
> or
> TokenStream tokens = ss.tokenize("search string");
> or
> ...
> 
> 3) serialize a token stream to a string.
> 
> Input desired!
> 
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.crosswire.org/pipermail/jsword-devel/attachments/20050409/cc0c2d71/attachment.html


More information about the jsword-devel mailing list