[sword-devel] [Wiki Frontend comparison] Proximity searches

DM Smith dmsmith at crosswire.org
Sun Mar 1 06:19:24 MST 2009


On Mar 1, 2009, at 3:33 AM, Ben Morgan wrote:

> On 01/03/2009, Peter von Kaehne <refdoc at gmx.net> wrote:
> It appears that all clucene/lucene capable frontends can do  
> proximity searches. BpBible exposes this via its GUI, others rely on  
> the clucene/lucene syntax.
>
> Q: Is there anything particular about bpbible's proximity searches  
> or do I simply use the wrong syntax?
>
> I get on the ABU in xiphos for the following search "god love"~15 32  
> results, but on bpbible with a proximity search limited to 15 words  
> distance I get 56 hits.
> BPBible's proximity is an approximation based on some average length  
> of word (~5 letters, I think... - though it may be calculated from  
> the module). So results may not be directly comparable.
>
> Looking at the list I am find only single verse references in  
> xiphos, but my understanding is that crossboundary searches should  
> be possible.
>
> What am I doing wrong? Or is in fact crossboundary search not  
> possible in other frontends?
>
> I don't believe proper crossboundary search is possible (as lucene  
> has documents). BPBible also allows (for example) phrases to cross  
> verse boundaries. I don't think any of the others do.

You are mostly right about cross-boundary search, where the boundary  
is a verse or chapter. JSword does have a limited support for cross- 
boundary search. But the user has to specify such a choice. It is not  
automatic.

I have talked with the Lucene folk about searching adjacent documents  
and they don't see it being added to core. Their suggestion was to  
have multiple documents per verse. The first set of documents would be  
as it is today. The second set of documents would span a particular  
number of documents, say 2 adjacent verses or the chapter. The size is  
dependent upon the assumption that  a user would not search for  
phrases or other things beyond that size.

A phrase search in the second set of documents would find phrases that  
did not exceed 2 verses or the chapter (using the given example).

Another approach would be to be to use offsets for the words.  
Typically Lucene starts the offset for the first term for the document  
at 0. The second at 2. And so forth. This is a function of the  
Analyzer and of the Field. But it is possible to change the offsets to  
be the term position in the source. (Note, this can be used to store  
alternate terms, n-grams, word forms, etc at the same position.)

While true cross-boundary search is useful in some situations. I don't  
think it is very useful. I think most people will use search to find a  
verse or a short passage.

For the most part, searching verses in isolation is more than  
sufficient.

I'd like to hear other thoughts on the usefulness of proper passage  
searching.

In Him,
	DM



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20090301/8272d912/attachment.html>


More information about the sword-devel mailing list