[sword-devel] Search up to 5.8 times faster now :)

Troy A. Griffitts scribe at crosswire.org
Wed Jun 2 14:41:45 MST 2004


Joachim,
	Great job!  I haven't looked too closely at the code, but enough to get 
the idea.  Chris, I think Joachim added some logic for phrase search, as 
well, though I didn't follow it when I read it in the patch briefly.

	Excited to post 1.5.8 someday.  Starting a new job has really been 
draining.

	-Troy.



Chris Little wrote:
> Does this only affect the multi-word search (not the phrase or regex 
> searches)?  It seems like we could achieve a similar gain in performance 
> for the phrase searches by splitting phrases into individual words, 
> applying your algorithm (search raw, then strip, then search again) to 
> limit the pool to those verses that include all of the words (regardless 
> of order), and then performing the current phrase search algorithm 
> (strip filters, then search) on that pool.  Just a thought.  There might 
> be some flawed logic that hasn't occurred to me.
> 
> --Chris
> 
> Joachim Ansorg wrote:
> 
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Hi,
>> the standard search function is now up to 5 times faster than before.
>>
>> Let me explain.
>> A search in a module did the following:
>>     1. Get the text of a key by calling all the strip filters ()
>>     2. Search the search words in the stripped down text
>>     3. If it was found add it to the result
>> We assume a module with 6 strip filters.
>> This means the expensive StripText() function got called 
>> 30000*6=180000 times.
>>
>> Now we check for the words in the raw text and only check keys which 
>> had a valid match in the raw text if they match in the stripped down 
>> text.
>> If we assume a normal query returns 100 results the StripText function 
>> gets called 100*6=600 times which saves a lot of time.
>>
>> Old/new comparision:
>>     time ./old/examples/cmdline/search KJV Revelation
>>         real    0m18.912s
>>         user    0m18.090s
>>         sys     0m0.780s
>>
>>     time ./new/examples/cmdline/search KJV Revelation
>>         real    0m3.396s
>>         user    0m2.540s
>>         sys     0m0.830s
>> Which is an improvement factor of 5.6 :)
>>
>>     ./new/examples/cmdline/search WEB God
>> only takes 2.1 secs now.
>>
>> Another example:
>>     time ./old/examples/cmdline/search KJV God
>>         real    0m20.371s
>>         user    0m18.130s
>>         sys     0m0.950s
>>
>>     time ./new/examples/cmdline/search KJV God
>>         real    0m5.566s
>>         user    0m4.730s
>>         sys     0m0.810s   
>> This is "only" 3.7 times faster, because searching in the raw text 
>> gives more hits which means more calls to StripText(). I tested it 
>> with a search for " " which means all verses and it's as slow as the 
>> old one. Which ones usual search queries are a lot faster than before.
>>
>> The fix is in CVS now.
>>
>> Joachim
>> - -- <>< Re: deemed!
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.2.4 (GNU/Linux)
>>
>> iD8DBQFAvlP4EyRIb2AZBB0RAqF0AKC+VgR5O3Ex9kmgtP8U6vlOgD82GwCfTapO
>> yCdN4G7E22dFk6oz09wAXXY=
>> =gqKO
>> -----END PGP SIGNATURE-----
>> _______________________________________________
>> sword-devel mailing list
>> sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
> 
> 
> _______________________________________________
> sword-devel mailing list
> sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel



More information about the sword-devel mailing list