[sword-devel] Fast search
Nathan
sword-devel@crosswire.org
Sun, 17 Sep 2000 15:06:03 +0200
Good day Troy and others
On 8th September: Troy Griffitts wrote:
> I hope no one is getting discouraged as to what should be
> developed for a fast search framework.
I need to ask a few questions at this point.
I have only joined the list recently, so if I am asking
some things which have been decided, please tell me so :)
--- Start of questions ---
Question 1:
How much effort do we really want to do to get the index
as small as possible?
Is it really worth all the effort to do arithmetic coding
on some delta list, and do 3 conversions to save 400K
on the size of the index? (See my earlier answer to
Jerry Hastings) A few years ago I would have done
it, but with the smallest hard disks for PC's being 8 Gb
nowadays, is it really a big deal to have an index of 900K?
If you really want to save space and do compression, then
why keep the text of the Bible uncompressed? You can save
much more compressing that! :)
It is the usual question of speed vs. simplicity vs. space.
Question 2:
The Sword program with all its others like BibleTime, etc.
will ultimately be for users as well, not just programmers.
If we look at the search from that perspective, we realize
that these users of the program do not know what a regular
expression is. They just want to type some words, and get
results (preferably fast). They might know something about
AND and OR and NOT and wildcards, and use these.
Why should they tell the program whether to use Multi-words,
phrase or regular expressions?
Should the program not be able to do all those from one
input box?
(My apologies if I am questioning design decisions which
you have made long ago.)
Question 3:
So, what is expected from the fast search?
Do we want to keep the regular expressions (as Martin also
asked on 7th Sept), or do we build it into the fast search?
Must the index be small?
Must the fast search be the main search, or is it just an
extra add-on?
Question 4:
Depending on the answer to 3,
If the new fast search does everything, and we do not need
regular expressions, do we keep the text of the modules
as text, or do we compress it for space and speed reasons?
This could have some wider effects so you will have to be
careful on this one :)
--- End of questions ---
Given some answers/decisions on these, the fast search should
actually not take all that long.
(I have some code for generating word-lists, bitmaps,
verse-lists, parsing queries and their trees, bitmap
operations, etc. Some of it is using VB, C and SQL stuff,
but at I can at least give you some pseudo-code, which can
be converted to C++ very quickly.)
God bless,
nathan
http://www.nathan.co.za
(PS. The following is my personal opinion and is not to be
seen as trying to influence anybody's decisions...
1. Keep it at 900 Kb. No compression. No 3x conversions.
2. Keep it simple for the users
3. Build it in
4. Don't compress it yet)