[sword-devel] Fast search index options

Jerry Hastings sword-devel@crosswire.org
Sat, 02 Sep 2000 23:57:58 -0700


At 10:55 PM 9/2/2000 -0700, Jerry Hastings wrote:

>3) Split the bit map into bytes. If the first byte is non-zero, count the 
>number of bytes to the first zero byte, but not more than 128 bytes. Add 
>128 to the count and save as the first byte of a file, followed by the 
>counted non-zero bytes. If the first byte was zero, count the number of 
>bytes until a non-zero byte, but not more than 128 bytes. Save the count 
>as the first byte of a file, but do not save the zero bytes. Start again 
>at the first byte after the bytes counted above and keep repeating. For 
>some words, like "the", NOT the bit map first--flip 1s and 0s.

After reconsidering this method, there is an improvement that would not be 
hard to include. First the NOT version is really the same as just comparing 
each byte to 255 instead of zero. But, instead of comparing to just zero 
and 255, as a complete bit map is first produced, the map could be analyze 
to find the byte value that compress the best. Use an array dim 0 to 255. 
While building the map, if a string of bytes of value x is found and is 
three bytes or longer
let array[x]=array[x]+ len(string) -2.
Then when the map is finished being built, the element of the array with 
the greatest value indicates what byte value to compare when compressing. 
If array[x] has the greatest value then x is the value to compare. This 
value could be placed as the first byte of the compressed file with the 
other data, as produced in method 3 above coming after it. In most cases 
though, the value will be zero. Only in very frequent words, like "the", 
would it be anything other than zero.

Jerry

Jerry