[sword-devel] Fast search -- some ideas
Stephen Denne
sword-devel@crosswire.org
Tue, 5 Sep 2000 23:48:04 +1200
G'day,
I took a quick look at indexing searches for the purposes of getting some
kind of complete app on a 1MB Palm Pilot... I decided that if I was to do
any coding (I'm more likely to upgrade my hardware), I'd use what you're
calling option 1, and simply leave out the most common words.
I was considering having a tree data structure where each record in the tree
containg the next letter (or unique letters) in the word, a (4 byte) pointer
to the next record in the same level, a (2 byte) pointer to the first record
of the next level, and the (zero-terminated) verselist, (2 bytes per verse).
The tree is ... er... climbed(?)... as the user enters the word, skipping as
many records as required till the entered letter is found (It would be a
simple matter to optimise this by ordering the index records within a level
by the total number of related verses), and then the resulting verse list is
immediately available. This also leads to an easy stemming implementation -
including (ORing) everything below that point in the tree.
This stucture saves space in that it doesn't require every letter of every
word to be stored (dictionary integrated with the concordance), but at the
same time makes any implementation of multiple spellings/languages using the
same index rather tricky. It also makes wildcard searches incredibly complex
(obviously I wasn't going to implement them).
If space were a huge concern you could save most of the last 12560 by
indicating that a bitmap is to be expected instead of a verselist with a
special verse pointer of 0 or FFFF
Stephen Denne.
There is... a time to search and a time to give up as lost; A time to keep
and a time to throw away.