[sword-devel] Fast search
Joachim Ansorg
sword-devel@crosswire.org
Wed, 27 Sep 2000 21:57:52 +0000
Hi!
Here are now my answers.
> Question 1:
> How much effort do we really want to do to get the index
> as small as possible?
> Is it really worth all the effort to do arithmetic coding
> on some delta list, and do 3 conversions to save 400K
> on the size of the index? (See my earlier answer to
> Jerry Hastings) A few years ago I would have done
> it, but with the smallest hard disks for PC's being 8 Gb
> nowadays, is it really a big deal to have an index of 900K?
>
> If you really want to save space and do compression, then
> why keep the text of the Bible uncompressed? You can save
> much more compressing that! :)
> It is the usual question of speed vs. simplicity vs. space.
I have now idea how much work it is to implment the better compression. But
if it's an reasonable amount of work please implement it!
We should _now_ do it correctly, who knoes on which platforms or operating
systems Sword will be used (BeOS, Pal, OSX, QNX ...), so we should be
prepared for every one!
> Question 2:
> The Sword program with all its others like BibleTime, etc.
> will ultimately be for users as well, not just programmers.
> If we look at the search from that perspective, we realize
> that these users of the program do not know what a regular
> expression is. They just want to type some words, and get
> results (preferably fast). They might know something about
> AND and OR and NOT and wildcards, and use these.
> Why should they tell the program whether to use Multi-words,
> phrase or regular expressions?
> Should the program not be able to do all those from one
> input box?
> (My apologies if I am questioning design decisions which
> you have made long ago.)
Yes, I think your ideas are good!
As one of the programmers of BibleTime I want to add that keeping the user
things as simple as possible is the best way!
But IMHO we should still offer case (in)sensitive search.
How do we handle multi-word / exact searc with your method? Or will we use
exact search with patterns (?,* etc.)?
> Question 3:
> So, what is expected from the fast search?
> Do we want to keep the regular expressions (as Martin also
> asked on 7th Sept), or do we build it into the fast search?
> Must the index be small?
> Must the fast search be the main search, or is it just an
> extra add-on?
The search should be as fast as possible!
It's annoying for users to see "Oh, very slow search!" and to find out
there's a much more fast one if you enable it!
Im not sure, but if we use the patterns described above maybe we could drop
the regexp things so we offer only the fast search?
Martin, what do you think?
Having a slow and a fast search is bad, we should only offer one type of
search so speed is everytime the same.
> Question 4:
> Depending on the answer to 3,
> If the new fast search does everything, and we do not need
> regular expressions, do we keep the text of the modules
> as text, or do we compress it for space and speed reasons?
> This could have some wider effects so you will have to be
> careful on this one :)
I think modules should be compressed. They take too much disk space at the
moment, so users with small hard disk will run into problems with the
uncompressed modules files + index files.
Troy, is it possible without changing everyhing?
I know it's a major thing which changes lots of things, but we should be
comparable to commercial products so users will have a good, stable, fast and
free bible study suit (backend + different frontends).
We want to release BibleTime 1.0 as a free software product which is
comparable to commercial programs (ok, at least some). To do this we need the
compressed modules.
Opinions?
--Joachim
> (PS. The following is my personal opinion and is not to be
> seen as trying to influence anybody's decisions...
> 1. Keep it at 900 Kb. No compression. No 3x conversions.
> 2. Keep it simple for the users
> 3. Build it in
> 4. Don't compress it yet)