[sword-devel] Search optimized (still too slow)

Thu, 8 Apr 2004 12:42:18 -0600

Hi Joachim,

I share your concerns about slow Search performance. My usage of Bible
software is mostly Searching rather than using links into commentaries.
Typically, I am most interested in getting to a specific verse with as few
keystrokes/mouse-clicks as possible, and doing relatively simple Searches.
For example, which verses contain 'election'? Where is 'tree of life'
mentioned?

I'm curious: what is your "benchmark" that generates the timings you
provided:
> WEB:
> before: 0m8.233s
> after: 0m7.586s
>
> KJV:
> before: 1m35.769s
> after: 0m21.874s

What word(s)/phrases are you searching for with what options? What hardware?
For example, how long goes it take to "find 'Jesus'" within the WEB and
within the KJV?

For my own benchmarks I used a slow/obsolete Pentium III /450 mhz box
(Win95, 256 meg memory, UDMA-5 80 gig drive). The search for 'Jesus' within
the WEB and with The SWORD Project BibleCS 1.5.6  took about 10-11 seconds.
Within the KJV, it took about 130 seconds. Those numbers seem 'within the
ballpark' of your numbers above.

Just wondering how acceptable it is from a memory usage point of view to
proceed as follows:

I'm finding with BerBible/LcdBible that searches can be radically improved
(5x ?? to 100x ??) by reading the entire Bible text into memory one time,
with all tags stripped out. The current logic (with uncompressed text) seems
to read a line into memory, use FilterMgr to iteratively remove a certain
kind of tag each of multiple passes, then do the search. Then get the next
line and continue. This seems to happen all over again for the next search.
(Compressed texts seem to buffer a chapter or so at a time, rather than line
at a time.)

BerBible (memory buffer based) took 220 ms with the WEB (~50x faster).
LcdBible (one pass tag removal rather than FilterManager) took about 1600 ms
(~7x faster). The above timings should not slow down appreciably with KJV
because the 'de-tagging' is only done once. (I'll try to get some specifics
about KJV rather than WAG's.)

Obviously you are going to have a tough time finding the phrase "these are
the generations" until the "<WHO428>" and "<FI>" tags are removed from the
following KJV line.

Gen 2:4 KJV
These<WH0428> <FI>are<Fi> the generations<WH08435> of the heavens<WH08064>
and of the earth<WH0776>

To me, the Search function is perhaps THE key functionality that Bible
software provides for the non-scholar, and should be seriously optimized
(within resource constraints). The entire text of one Bible (OT+NT without
tags) will generally fit into about 4 meg, so obviously memory-based
searching isn't 'free'.

On my main computer, Sword.exe takes about 10.5 meg of memory. BerBible
takes about 8.8 meg. LcdBible uses about 4.5 meg. Searching across multiple
open texts (KJV + WEB + BBE + MKJV + etc.) probably isn't practical on a
less than latest/greatest computer. Load-time can get onerous with too much
pre-loading, also.

Another approach is to use a 'one-pass tag remover state-machine' optimized
for searching, rather than multiple passes thru each line using the
FilterManager. It is feasible to have logic to speed this filtering up
significantly. This approach isn't so resource hungry as searching within a
'de-tagged' memory buffer. This approach is used by LcdBible.

My $0.02,
Lynn A.
l.allan@att.net

----- Original Message ----- 
From: "Joachim Ansorg" <junkmail@joachim.ansorgs.de>
To: <sword-devel@crosswire.org>
Sent: Thursday, April 08, 2004 7:59 AM
Subject: [sword-devel] Search optimized (still too slow)

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi,
> I spent some time to optimize the search in CVS.
> The problem is/was for example the extensive the use of XMLTag in the
filters,
> I tried to avoid them in the filters where it was possible without having
to
> rewrite them.
> I also used SWBuf::append directly where SWBuf::operator+ was used before.
>
> I see some good chances where we can optimize:
> -Using XMLTag as few as possible
> -Change copy constructor of SWBuf to implicit sharing, we have lots of
SWBuf
> copy-constructor calls I think
> -optimize SWBuf::append(char), maybe we can tweak the memory allocation to
> alloc larger blocks but more seldom. the append(char) function gets called
> more than any other function in a search
>
> But the best solution would be to parse the text only once and then do the
> right stuff with it. ATM each filter parses the text again which will make
> modules with lot's of filters slow (e.g. KJV).
>
> I got these results (with debug code and profiling code included):
> WEB:
> before: 0m8.233s
> after: 0m7.586s
>
> KJV:
> before: 1m35.769s
> after: 0m21.874s
>
>
> I have not yet committed, because I have to make sure the code doesn't
have
> some untested bugs.
>
> Joachim
> - -- 
> <>< Re: deemed!
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.4 (GNU/Linux)
>
> iD8DBQFAdVrUEyRIb2AZBB0RAps7AKC0fqFICmN2bMp5fc5ZTTgegyTn3QCghcjV
> 2yE6KnS4ma6u4YnVY7i7HSI=
> =J65x
> -----END PGP SIGNATURE-----
> _______________________________________________
> sword-devel mailing list
> sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel