[jsword-devel] Indexing
DM Smith
dmsmith555 at yahoo.com
Tue Oct 5 04:23:16 MST 2004
It looks like lucene is a better solution than ser from what I have
seen. It seems to take about the same amount of time on my machine (7
minutes), but uses far less resources. I used to get the dreaded "Out of
Memory" Even with it set to 512M and it used to grab the entire CPU and
thrash the disk (paging I think). Now, it got up to less than 50M, used
a max of 75% of my CPU and did not thrash on disk.
So while the performance can be improved by caching the indexes on the
server, I think this is good enough for 1.0.
Also, I think that we may discover other issues as more and more Bibles
are indexed. For example, should accented text be indexed with or
without the accents (e.g. Hebrew breathing, Greek accents,
French/Spanish/German diacriticals, ...). Perhaps we want to have a
transliteration index. We have these as a post 1.0 tasks and they may
result in additional indexes or changes to the existing index.
If we were to version the indexes on the server that would work to solve
these problems.
Also, if we do put the indexes on the server, I think the application
should be prepared to create an index for a module whose index has not
been cached (for example a new module).
Another thought, would there be any advantage of having an index per
testament? This would allow for the indexing and the searching to be
multithreaded. Later we may want to add the ability to search for a word
across multiple bibles.
More information about the jsword-devel
mailing list