[sword-devel] [Wiki Frontend comparison] Proximity searches
DM Smith
dmsmith at crosswire.org
Sun Mar 1 06:19:24 MST 2009
On Mar 1, 2009, at 3:33 AM, Ben Morgan wrote:
> On 01/03/2009, Peter von Kaehne <refdoc at gmx.net> wrote:
> It appears that all clucene/lucene capable frontends can do
> proximity searches. BpBible exposes this via its GUI, others rely on
> the clucene/lucene syntax.
>
> Q: Is there anything particular about bpbible's proximity searches
> or do I simply use the wrong syntax?
>
> I get on the ABU in xiphos for the following search "god love"~15 32
> results, but on bpbible with a proximity search limited to 15 words
> distance I get 56 hits.
> BPBible's proximity is an approximation based on some average length
> of word (~5 letters, I think... - though it may be calculated from
> the module). So results may not be directly comparable.
>
> Looking at the list I am find only single verse references in
> xiphos, but my understanding is that crossboundary searches should
> be possible.
>
> What am I doing wrong? Or is in fact crossboundary search not
> possible in other frontends?
>
> I don't believe proper crossboundary search is possible (as lucene
> has documents). BPBible also allows (for example) phrases to cross
> verse boundaries. I don't think any of the others do.
You are mostly right about cross-boundary search, where the boundary
is a verse or chapter. JSword does have a limited support for cross-
boundary search. But the user has to specify such a choice. It is not
automatic.
I have talked with the Lucene folk about searching adjacent documents
and they don't see it being added to core. Their suggestion was to
have multiple documents per verse. The first set of documents would be
as it is today. The second set of documents would span a particular
number of documents, say 2 adjacent verses or the chapter. The size is
dependent upon the assumption that a user would not search for
phrases or other things beyond that size.
A phrase search in the second set of documents would find phrases that
did not exceed 2 verses or the chapter (using the given example).
Another approach would be to be to use offsets for the words.
Typically Lucene starts the offset for the first term for the document
at 0. The second at 2. And so forth. This is a function of the
Analyzer and of the Field. But it is possible to change the offsets to
be the term position in the source. (Note, this can be used to store
alternate terms, n-grams, word forms, etc at the same position.)
While true cross-boundary search is useful in some situations. I don't
think it is very useful. I think most people will use search to find a
verse or a short passage.
For the most part, searching verses in isolation is more than
sufficient.
I'd like to hear other thoughts on the usefulness of proper passage
searching.
In Him,
DM
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20090301/8272d912/attachment.html>
More information about the sword-devel
mailing list