[jsword-devel] Lucene search bug

DM Smith dmsmith555 at yahoo.com
Tue Jan 25 21:55:38 MST 2005


I had noted earlier that when I searched on "bread" in the KJV, I only 
got about 20 hits.

I have been looking into what is happening.

In doing so I found a bug which at first I thought might have been 
related. Seems that the call
                BookData data = book.getData(subkey);
                String text = data.getPlainText();
returns the verse reference butt up against the verse text, as in:
Gen 1.1In the beginning God created the heavens and the earth.....

Turns out that the document is something like:
<div>
<title>Gen 1.1</title>
<verse>In the beginning...</verse>
</div>
(this is leaving out attributes and other details)

It concatenates the text from all the children of the div element. Seems 
to me that it should only do so for verse text. The code is insensitive 
as to whether the text is for a title, note, footnote or some other 
non-verse element.

How should it be? (In my copy, I have it skipping the title element.)

Anyway, enough with that digression from the indexing problem. I put in 
a breakpoint on the verse when it contained "bread" and found that the 
data was in fact getting to the indexer.

In looking at the verses, it seemed that they had "bread" in more than 
once. This made me go down the wrong path of seeing whether it was only 
indexing words in verses if they occurred multiple times.

I then ran a bunch of searches on common words (Lord, God, Jesus, bread, 
...) and none of them came back with more than 21 verses. Also, after 
deleting and regenerating the index (after I removed the leading verse 
reference), the results were a different 20.

I think what is happening is that the search is not returning an 
exhaustive answer, but is trying to come up with the top 20.



More information about the jsword-devel mailing list