[jsword-devel] Indexing

DM Smith dmsmith555 at yahoo.com
Tue Oct 5 04:23:16 MST 2004

It looks like lucene is a better solution than ser from what I have 
seen. It seems to take about the same amount of time on my machine (7 
minutes), but uses far less resources. I used to get the dreaded "Out of 
Memory" Even with it set to 512M and it used to grab the entire CPU and 
thrash the disk (paging I think). Now, it got up to less than 50M, used 
a max of 75% of my CPU and did not thrash on disk.

So while the performance can be improved by caching the indexes on the 
server, I think this is good enough for 1.0.

Also, I think that we may discover other issues as more and more Bibles 
are indexed. For example, should accented text be indexed with or 
without the accents (e.g. Hebrew breathing, Greek accents, 
French/Spanish/German diacriticals, ...). Perhaps we want to have a 
transliteration index. We have these as a post 1.0 tasks and they may 
result in additional indexes or changes to the existing index.

If we were to version the indexes on the server that would work to solve 
these problems.

Also, if we do put the indexes on the server, I think the application 
should be prepared to create an index for a module whose index has not 
been cached (for example a new module).

Another thought, would there be any advantage of having an index per 
testament? This would allow for the indexing and the searching to be 
multithreaded. Later we may want to add the ability to search for a word 
across multiple bibles.

More information about the jsword-devel mailing list