[sword-devel] Re: Search optimized (still too slow)
Geoffrey W Hastings
sword-devel@crosswire.org
Thu, 8 Apr 2004 16:05:22 -0700
I tried my own test.
Running Win XP Pro 384 meg ram 60 gig hard drive.
The results show another reason why we still need KJV Lite
WEB 5 seconds
KJV with Strongs 175 seconds (2 minutes 55 seconds) way too long.
KJV Lite 20 seconds
AKJV 17 seconds
MKJV 17 seconds
Geoff
On Thu, 8 Apr 2004 12:42:18 -0600 "Lynn Allan" <paracletos@adelphia.net>
writes:
> Hi Joachim,
>
> I share your concerns about slow Search performance. My usage of
> Bible
> software is mostly Searching rather than using links into
> commentaries.
> Typically, I am most interested in getting to a specific verse with
> as few
> keystrokes/mouse-clicks as possible, and doing relatively simple
> Searches.
> For example, which verses contain 'election'? Where is 'tree of
> life'
> mentioned?
>
> I'm curious: what is your "benchmark" that generates the timings
> you
> provided:
> > WEB:
> > before: 0m8.233s
> > after: 0m7.586s
> >
> > KJV:
> > before: 1m35.769s
> > after: 0m21.874s
>
> What word(s)/phrases are you searching for with what options? What
> hardware?
> For example, how long goes it take to "find 'Jesus'" within the WEB
> and
> within the KJV?
>
> For my own benchmarks I used a slow/obsolete Pentium III /450 mhz
> box
> (Win95, 256 meg memory, UDMA-5 80 gig drive). The search for 'Jesus'
> within
> the WEB and with The SWORD Project BibleCS 1.5.6 took about 10-11
> seconds.
> Within the KJV, it took about 130 seconds. Those numbers seem
> 'within the
> ballpark' of your numbers above.
>
> Just wondering how acceptable it is from a memory usage point of
> view to
> proceed as follows:
>
> I'm finding with BerBible/LcdBible that searches can be radically
> improved
> (5x ?? to 100x ??) by reading the entire Bible text into memory one
> time,
> with all tags stripped out. The current logic (with uncompressed
> text) seems
> to read a line into memory, use FilterMgr to iteratively remove a
> certain
> kind of tag each of multiple passes, then do the search. Then get
> the next
> line and continue. This seems to happen all over again for the next
> search.
> (Compressed texts seem to buffer a chapter or so at a time, rather
> than line
> at a time.)
>
> BerBible (memory buffer based) took 220 ms with the WEB (~50x
> faster).
> LcdBible (one pass tag removal rather than FilterManager) took about
> 1600 ms
> (~7x faster). The above timings should not slow down appreciably
> with KJV
> because the 'de-tagging' is only done once. (I'll try to get some
> specifics
> about KJV rather than WAG's.)
>
> Obviously you are going to have a tough time finding the phrase
> "these are
> the generations" until the "<WHO428>" and "<FI>" tags are removed
> from the
> following KJV line.
>
> Gen 2:4 KJV
> These<WH0428> <FI>are<Fi> the generations<WH08435> of the
> heavens<WH08064>
> and of the earth<WH0776>
>
> To me, the Search function is perhaps THE key functionality that
> Bible
> software provides for the non-scholar, and should be seriously
> optimized
> (within resource constraints). The entire text of one Bible (OT+NT
> without
> tags) will generally fit into about 4 meg, so obviously
> memory-based
> searching isn't 'free'.
>
> On my main computer, Sword.exe takes about 10.5 meg of memory.
> BerBible
> takes about 8.8 meg. LcdBible uses about 4.5 meg. Searching across
> multiple
> open texts (KJV + WEB + BBE + MKJV + etc.) probably isn't practical
> on a
> less than latest/greatest computer. Load-time can get onerous with
> too much
> pre-loading, also.
>
> Another approach is to use a 'one-pass tag remover state-machine'
> optimized
> for searching, rather than multiple passes thru each line using the
> FilterManager. It is feasible to have logic to speed this filtering
> up
> significantly. This approach isn't so resource hungry as searching
> within a
> 'de-tagged' memory buffer. This approach is used by LcdBible.
>
> My $0.02,
> Lynn A.
> l.allan@att.net
>
> ----- Original Message -----
> From: "Joachim Ansorg" <junkmail@joachim.ansorgs.de>
> To: <sword-devel@crosswire.org>
> Sent: Thursday, April 08, 2004 7:59 AM
> Subject: [sword-devel] Search optimized (still too slow)
>
>
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Hi,
> > I spent some time to optimize the search in CVS.
> > The problem is/was for example the extensive the use of XMLTag in
> the
> filters,
> > I tried to avoid them in the filters where it was possible without
> having
> to
> > rewrite them.
> > I also used SWBuf::append directly where SWBuf::operator+ was used
> before.
> >
> > I see some good chances where we can optimize:
> > -Using XMLTag as few as possible
> > -Change copy constructor of SWBuf to implicit sharing, we have
> lots of
> SWBuf
> > copy-constructor calls I think
> > -optimize SWBuf::append(char), maybe we can tweak the memory
> allocation to
> > alloc larger blocks but more seldom. the append(char) function
> gets called
> > more than any other function in a search
> >
> > But the best solution would be to parse the text only once and
> then do the
> > right stuff with it. ATM each filter parses the text again which
> will make
> > modules with lot's of filters slow (e.g. KJV).
> >
> > I got these results (with debug code and profiling code
> included):
> > WEB:
> > before: 0m8.233s
> > after: 0m7.586s
> >
> > KJV:
> > before: 1m35.769s
> > after: 0m21.874s
> >
> >
> > I have not yet committed, because I have to make sure the code
> doesn't
> have
> > some untested bugs.
> >
> > Joachim
> > - --
> > <>< Re: deemed!
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.2.4 (GNU/Linux)
> >
> > iD8DBQFAdVrUEyRIb2AZBB0RAps7AKC0fqFICmN2bMp5fc5ZTTgegyTn3QCghcjV
> > 2yE6KnS4ma6u4YnVY7i7HSI=
> > =J65x
> > -----END PGP SIGNATURE-----
> > _______________________________________________
> > sword-devel mailing list
> > sword-devel@crosswire.org
> > http://www.crosswire.org/mailman/listinfo/sword-devel
>
>
> _______________________________________________
> sword-devel mailing list
> sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
>
>
________________________________________________________________
The best thing to hit the Internet in years - Juno SpeedBand!
Surf the Web up to FIVE TIMES FASTER!
Only $14.95/ month - visit www.juno.com to sign up today!