[jsword-devel] Indexing issues
Joe Walker
joe at eireneh.com
Wed Jun 30 13:47:41 MST 2004
DM Smith wrote:
> Before 0.9.7 there was a problem with the KJV w/ Strongs as the XML is
> not valid (it seems to be well formed) and the use of a validating
> parser choked on the input. (i.e. on the "resp" element in the NT)
>
> I have had problems in the past where the program would run out of
> memory. The few searches that I have tried did not work as I expected.
> As a result of this and since there was enough to do elsewhere in the
> program, I have not spent any time in the past with search (including
> testing it).
>
> So tried the following tests on KJV (1769) with Strongs:
> Remove the index directory, ~/.jsword/sword-KJV
> Display Rev 21 via passage lookup.
> Search for "chrysoprasus" (found only in Rev 21:20)
> I had to repeat the search.
> Try another search restricted to "Rev" for "prophecy"
> I tried to delete the index directory, ~/.jsword/sword-KJV and re-index.
My methodology was similar, I downloaded the latest KJV module first and
searched for "aaron" - using the latest CVS.
> In eclipse, with the latest code:
> The indexing proceeds along giving greater and greater percentages and
> verses further along. The last percent that I see is 60% and the verse
> is Rev 22:21. So visually the progress bar seems to indicate that it has
> not completed, but quit at 60%.
Something is not right here. Rev 22:21 is more than 60% of the way
through the Bible!
Are we using the same search engine? I'm fairly sure I'm on ser rather
than lucene.
...
> I have a coding practice I call the "Principle of Least Surprise." I
> found it surprising that the verse I was viewing was replaced with the
> search result. I would have expected it to open another tab. In my
> opinion this would be more useful as each search would have its own tab.
That would have us move the search/match panel into the toolbar or a dialog.
It does occur to me that if we were to move the whole selection pane
above the tabs then it would look a lot more like a tabbed web browser,
However I've never liked the way many tabbed browsers have a single URL
box shared across many tabs. Having the URL suddenly change when a new
tab is picked seems surprising to me.
Is this sort of change pre 1.0 or post 1.0?
> The message that was presented when the work was index said that the
> results would be more accurate (forget the exact wording). I think that
> the message is a bit misleading. Could we put up a dialog box asking
> permission to index, stating that it may take a few minutes and then
> block until it is done (still showing the progress meter) and once done,
> the search would be performed?
Could do - in theory there is nothing to stop the ser engine providing
partial results during indexing, so the "more accurate" thing is true.
> The metering could be more accurate with each verse being a step along
> the way. Since the number of verses is known in advance the meter would
> be more useful.
>
> With regard to the "in use" error that I had under WinXP I am not sure
> what would be best. Under UNIX it is no problem as an open file can be
> deleted by any process even if it is open by another. The directory
> entry in the file system is deleted but it is not until the reference
> count on the actual file reaches zero that the file is deleted. Windows
> on the other hand is much more onerous in its handling of the file. It
> cannot be deleted if it is in use. The question that comes to mind is:
> "Should the program hold open file handles for indexes?" If I am
> searching a bunch of bibles I may hit a resource limit.
The ideal for me would be to have the index being "Activatable" so when
resources get low we can make a decision to deactivate some unused
functions. Maybe the search engine does not implement Activatable.
Joe.
More information about the jsword-devel
mailing list