[sword-devel] Some other fast search index options

Jerry Hastings sword-devel@crosswire.org
Wed, 27 Sep 2000 14:04:42 -0700


At 03:05 PM 9/17/2000 +0200, Nathan wrote:
>Good day Jerry Hastings and others

Likewise, and I am sorry this reply has taken so long.

>On 3rd September, Jerry Hastings wrote:
> > Here are three more methods to try.  Consider the bit map
> > divided into records, of say 32, 16, or 8 bits each.

>The problem is that we are still compressing the BITMAP files.
>The uncompressed bitmap files are 48.8 Mbyte. Compressing this
>down to 600K is impressive, but we can do better by rather
>trying to compress the verse-lists (list of verses in which the
>word occurs). The uncompressed size of these lists is 1.23 Mb.

One would not use those methods I suggested for max compression. What they 
provide is a simple algorithm that quickly reproduces sections of bit maps, 
with their locations in the bit map, to OR/AND with another bit map. The 
'inverted lists' are based on the same RLE kind of scheme, but the bit map 
record length is one bit and you don't need to store that image as all 
records looks the same. So, you only store the deltas. The draw back with 
the inverted lists is that you end up with only one bit of a bit map at a 
time. But with CPUs as fast as they are, decompressing and converting bits 
to bytes is probably faster than reading longer uncompressed files. 
Anything that performs searches under a second is fast enough, unless it is 
running in a multi-user server application, as some web based uses could be.


>To use this index, the following will have to be done:
>1. Evaluate the search query
>2. Get the words in the word-list.
>3. Get their corresponding compressed delta-lists.
>4. Uncompress.
>5. Convert delta-list to verse-list
>6. Convert to bitmap if you need to do NOT, OR, AND, or
>    wildcards

I would guess that step 5 is skipped if you do step 6.


>Question: Is it really worth all that effort to save 400K
>on the size of the index? A few years ago I would have done
>it, but with the smallest hard disks for PC's being 8 Gb
>nowadays, is it really a big deal to have an index of 900K?

My latest HD is 20gig.. Sword being compressed will not mean much to me, 
unless it helps to speed it up. It may make a difference to those that want 
to use it on hand helds and lap tops.

>If you really want to compress, then why keep the text of the
>Bible uncompressed? You can save much more compressing that! :)

True.

Jerry