[sword-devel] Another Important Issue
Troy A. Griffitts
sword-devel@crosswire.org
Tue, 29 Aug 2000 15:21:37 -0700
Joe and others that asked,
The code for our first attempt at word indices and fast searches is in:
sword/src/modules/texts/rawtext/rawtext.cpp
RawText::createSearchFramework // creates the framework (done once)
RawText::Search // uses the framework
Anyone want to reimplement these?
I know! Let's have a contest! :) Smallest indices with the fastest
_accurate_ response time wins.
:),
-Troy.
Joe Walker wrote:
>
> Nathan wrote:
> > In option 3, would the bitmap not be about 8.3K? (31102 verses / 8)
> > Else it is a bytemap, not a bitmap :)
>
> To put the size in perspective. If you take every word in the KJV and
> search for it and store the results in whatever is the smallest of
> the 3 approaches mentioned, and store the lot in a big RandomAccessFile
> then the total size is 4.5Mb
>
> This is in my opinion a little on the large size if you want to do
> a d/l of a new version. However since the data is duplicated there is
> nothing to stop a clever installation script creating it, or even
> a very clever caching search that creates it on the fly.
>
> > You are right that it is very fast. I use the same method.
> > For wildcards it is also really fast (just OR a few bitmaps).
> > The way to work around the huge size of the "bitmap index" is to
> > store it in another format (like a list or Ranged list) and
> > convert when needed.
>
> I have a working scheme where by you can do a best match. So you type
> in your phrase and it first looks up every word you typed in in a
> thesaurus and then searches for every match, returning you the verse with
> hopefully the most similar meaning.
> I find it very useful, but for it to work you do need a blinding fast
> search mechanism.
>
> > I like your idea about the RangedPassage as well. It really makes
> > the list of verses for certain "common" words much smaller.
> >
> > Where is your program located Joe?
>
> There was a servlet version on the web, but I think it is broken right
> now. I've been working on a project for my brother (blood and in Christ)
> that needed to be done before his wedding, so I've not done much on it
> in the past few months.
>
> If you want to look at code, then I can send you what ever you want
> very quickly. If you want a working product then I'll need a few more
> weeks.
>
> I've tarred up the code in question. And I'll place it at:
> http://www.eireneh.com/passage.tar.gz
> It is all Java, and will only be of use for case A above.
>
> Joe.