[sword-devel] Re: Search optimized (still too slow)

Geoffrey W Hastings sword-devel@crosswire.org
Thu, 8 Apr 2004 16:05:22 -0700


I tried my own test.
Running Win XP Pro 384 meg ram 60 gig hard drive.
The results show another reason why we still need KJV Lite
WEB 5 seconds
KJV with Strongs 175 seconds (2 minutes 55 seconds) way too long.
KJV Lite 20 seconds
AKJV 17 seconds
MKJV 17 seconds

Geoff

On Thu, 8 Apr 2004 12:42:18 -0600 "Lynn Allan" <paracletos@adelphia.net>
writes:
> Hi Joachim,
> 
> I share your concerns about slow Search performance. My usage of 
> Bible
> software is mostly Searching rather than using links into 
> commentaries.
> Typically, I am most interested in getting to a specific verse with 
> as few
> keystrokes/mouse-clicks as possible, and doing relatively simple 
> Searches.
> For example, which verses contain 'election'? Where is 'tree of 
> life'
> mentioned?
> 
> I'm curious: what is your "benchmark" that generates the timings 
> you
> provided:
> > WEB:
> > before: 0m8.233s
> > after: 0m7.586s
> >
> > KJV:
> > before: 1m35.769s
> > after: 0m21.874s
> 
> What word(s)/phrases are you searching for with what options? What 
> hardware?
> For example, how long goes it take to "find 'Jesus'" within the WEB 
> and
> within the KJV?
> 
> For my own benchmarks I used a slow/obsolete Pentium III /450 mhz 
> box
> (Win95, 256 meg memory, UDMA-5 80 gig drive). The search for 'Jesus' 
> within
> the WEB and with The SWORD Project BibleCS 1.5.6  took about 10-11 
> seconds.
> Within the KJV, it took about 130 seconds. Those numbers seem 
> 'within the
> ballpark' of your numbers above.
> 
> Just wondering how acceptable it is from a memory usage point of 
> view to
> proceed as follows:
> 
> I'm finding with BerBible/LcdBible that searches can be radically 
> improved
> (5x ?? to 100x ??) by reading the entire Bible text into memory one 
> time,
> with all tags stripped out. The current logic (with uncompressed 
> text) seems
> to read a line into memory, use FilterMgr to iteratively remove a 
> certain
> kind of tag each of multiple passes, then do the search. Then get 
> the next
> line and continue. This seems to happen all over again for the next 
> search.
> (Compressed texts seem to buffer a chapter or so at a time, rather 
> than line
> at a time.)
> 
> BerBible (memory buffer based) took 220 ms with the WEB (~50x 
> faster).
> LcdBible (one pass tag removal rather than FilterManager) took about 
> 1600 ms
> (~7x faster). The above timings should not slow down appreciably 
> with KJV
> because the 'de-tagging' is only done once. (I'll try to get some 
> specifics
> about KJV rather than WAG's.)
> 
> Obviously you are going to have a tough time finding the phrase 
> "these are
> the generations" until the "<WHO428>" and "<FI>" tags are removed 
> from the
> following KJV line.
> 
> Gen 2:4 KJV
> These<WH0428> <FI>are<Fi> the generations<WH08435> of the 
> heavens<WH08064>
> and of the earth<WH0776>
> 
> To me, the Search function is perhaps THE key functionality that 
> Bible
> software provides for the non-scholar, and should be seriously 
> optimized
> (within resource constraints). The entire text of one Bible (OT+NT 
> without
> tags) will generally fit into about 4 meg, so obviously 
> memory-based
> searching isn't 'free'.
> 
> On my main computer, Sword.exe takes about 10.5 meg of memory. 
> BerBible
> takes about 8.8 meg. LcdBible uses about 4.5 meg. Searching across 
> multiple
> open texts (KJV + WEB + BBE + MKJV + etc.) probably isn't practical 
> on a
> less than latest/greatest computer. Load-time can get onerous with 
> too much
> pre-loading, also.
> 
> Another approach is to use a 'one-pass tag remover state-machine' 
> optimized
> for searching, rather than multiple passes thru each line using the
> FilterManager. It is feasible to have logic to speed this filtering 
> up
> significantly. This approach isn't so resource hungry as searching 
> within a
> 'de-tagged' memory buffer. This approach is used by LcdBible.
> 
> My $0.02,
> Lynn A.
> l.allan@att.net
> 
> ----- Original Message ----- 
> From: "Joachim Ansorg" <junkmail@joachim.ansorgs.de>
> To: <sword-devel@crosswire.org>
> Sent: Thursday, April 08, 2004 7:59 AM
> Subject: [sword-devel] Search optimized (still too slow)
> 
> 
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Hi,
> > I spent some time to optimize the search in CVS.
> > The problem is/was for example the extensive the use of XMLTag in 
> the
> filters,
> > I tried to avoid them in the filters where it was possible without 
> having
> to
> > rewrite them.
> > I also used SWBuf::append directly where SWBuf::operator+ was used 
> before.
> >
> > I see some good chances where we can optimize:
> > -Using XMLTag as few as possible
> > -Change copy constructor of SWBuf to implicit sharing, we have 
> lots of
> SWBuf
> > copy-constructor calls I think
> > -optimize SWBuf::append(char), maybe we can tweak the memory 
> allocation to
> > alloc larger blocks but more seldom. the append(char) function 
> gets called
> > more than any other function in a search
> >
> > But the best solution would be to parse the text only once and 
> then do the
> > right stuff with it. ATM each filter parses the text again which 
> will make
> > modules with lot's of filters slow (e.g. KJV).
> >
> > I got these results (with debug code and profiling code 
> included):
> > WEB:
> > before: 0m8.233s
> > after: 0m7.586s
> >
> > KJV:
> > before: 1m35.769s
> > after: 0m21.874s
> >
> >
> > I have not yet committed, because I have to make sure the code 
> doesn't
> have
> > some untested bugs.
> >
> > Joachim
> > - -- 
> > <>< Re: deemed!
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.2.4 (GNU/Linux)
> >
> > iD8DBQFAdVrUEyRIb2AZBB0RAps7AKC0fqFICmN2bMp5fc5ZTTgegyTn3QCghcjV
> > 2yE6KnS4ma6u4YnVY7i7HSI=
> > =J65x
> > -----END PGP SIGNATURE-----
> > _______________________________________________
> > sword-devel mailing list
> > sword-devel@crosswire.org
> > http://www.crosswire.org/mailman/listinfo/sword-devel
> 
> 
> _______________________________________________
> sword-devel mailing list
> sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> 
> 

________________________________________________________________
The best thing to hit the Internet in years - Juno SpeedBand!
Surf the Web up to FIVE TIMES FASTER!
Only $14.95/ month - visit www.juno.com to sign up today!