[jsword-devel] Indexing issues

Joe Walker joe at eireneh.com
Wed Jun 30 13:47:41 MST 2004

DM Smith wrote:

> Before 0.9.7 there was a problem with the KJV w/ Strongs as the XML is 
> not valid (it seems to be well formed) and the use of a validating 
> parser choked on the input. (i.e. on the "resp" element in the NT)
> I have had problems in the past where the program would run out of 
> memory. The few searches that I have tried did not work as I expected. 
> As a result of this and since there was enough to do elsewhere in the 
> program, I have not spent any time in the past with search (including 
> testing it).
> So tried the following tests on KJV (1769) with Strongs:
> Remove the index directory, ~/.jsword/sword-KJV
> Display Rev 21 via passage lookup.
> Search for "chrysoprasus" (found only in Rev 21:20)
> I had to repeat the search.
> Try another search restricted to "Rev" for "prophecy"
> I tried to delete the index directory, ~/.jsword/sword-KJV and re-index.

My methodology was similar, I downloaded the latest KJV module first and 
searched for "aaron" - using the latest CVS.

> In eclipse, with the latest code:
> The indexing proceeds along giving greater and greater percentages and 
> verses further along. The last percent that I see is 60% and the verse 
> is Rev 22:21. So visually the progress bar seems to indicate that it has 
> not completed, but quit at 60%.

Something is not right here. Rev 22:21 is more than 60% of the way 
through the Bible!
Are we using the same search engine? I'm fairly sure I'm on ser rather 
than lucene.


> I have a coding practice I call the "Principle of Least Surprise." I 
> found it surprising that the verse I was viewing was replaced with the 
> search result. I would have expected it to open another tab. In my 
> opinion this would be more useful as each search would have its own tab.

That would have us move the search/match panel into the toolbar or a dialog.
It does occur to me that if we were to move the whole selection pane 
above the tabs then it would look a lot more like a tabbed web browser, 
However I've never liked the way many tabbed browsers have a single URL 
box shared across many tabs. Having the URL suddenly change when a new 
tab is picked seems surprising to me.

Is this sort of change pre 1.0 or post 1.0?

> The message that was presented when the work was index said that the 
> results would be more accurate (forget the exact wording). I think that 
> the message is a bit misleading. Could we put up a dialog box asking 
> permission to index, stating that it may take a few minutes and then 
> block until it is done (still showing the progress meter) and once done, 
> the search would be performed?

Could do - in theory there is nothing to stop the ser engine providing 
partial results during indexing, so the "more accurate" thing is true.

> The metering could be more accurate with each verse being a step along 
> the way. Since the number of verses is known in advance the meter would 
> be more useful.
> With regard to the "in use" error that I had under WinXP I am not sure 
> what would be best. Under UNIX it is no problem as an open file can be 
> deleted by any process even if it is open by another. The directory 
> entry in the file system is deleted but it is not until the reference 
> count on the actual file reaches zero that the file is deleted. Windows 
> on the other hand is much more onerous in its handling of the file. It 
> cannot be deleted if it is in use. The question that comes to mind is: 
> "Should the program hold open file handles for indexes?" If I am 
> searching a bunch of bibles I may hit a resource limit.

The ideal for me would be to have the index being "Activatable" so when 
resources get low we can make a decision to deactivate some unused 
functions. Maybe the search engine does not implement Activatable.


More information about the jsword-devel mailing list