[sword-devel] Another Important Issue
Nathan
sword-devel@crosswire.org
Tue, 29 Aug 2000 21:52:00 +0200
Good day,
In option 3, would the bitmap not be about 8.3K? (31102 verses / 8)
Else it is a bytemap, not a bitmap :)
You are right that it is very fast. I use the same method.
For wildcards it is also really fast (just OR a few bitmaps).
The way to work around the huge size of the "bitmap index" is to
store it in another format (like a list or Ranged list) and
convert when needed.
I like your idea about the RangedPassage as well. It really makes
the list of verses for certain "common" words much smaller.
Where is your program located Joe?
God bless,
nathan
-----Original Message-----
From: owner-sword-devel@crosswire.org
[mailto:owner-sword-devel@crosswire.org]On Behalf Of Joe Walker
Sent: 29 August 2000 05:23
To: sword-devel@crosswire.org
Subject: Re: [sword-devel] Another Important Issue
Hi,
If I understand the problem correctly the problem is that a search for
"the" or "Lord" comes up with lots of hits and storing all those hits
in a fast index file uses a lot of space.
My Java program uses 3 ways to store lists of verses to combat this.
They are all available either in memory or on disk.
I have an interface "Passage" that stores a list of verses and 3
implementations:
DistinctPassage - a simple list of verses.
RangedPassage - stores a list of verse pairs (start and end)
BitwisePassage - stores a 31k long bitmap. One bit per verse
The latter can promise that any result set can be stored in only a
few K, and a fairly simple bit of maths can be used to work out which
is the best algorithm to use.
The latter also makes for very very very fast result set combining
methods. To search for "moses" AND "manna" I simple AND the bitmaps
together.
Shout if you want to know more. All code is GPL.
Joe.