[sword-devel] Comming soon: new improved sword searching

Joel Mawhorter sword-devel@crosswire.org
Sat, 07 Sep 2002 22:42:37 -0700


With 95% less time and 7 essential nutrients!

Hi all,

Most of you don't know me but I've been hanging out in this list for a few 
years. I've been working on a Bible search program that I started in my last 
year of University as a guided project. My focus with this Bible program was 
to implement full featured searching for non-Latin based languages. What I 
want to see is people all over the world able to study the Bible in their own 
language. Several times in the past I have evaluated Sword and considered 
just putting my effort into that but the support for non-Latin languages just 
wasn't there. However, it now seems to be getting much closer and I think 
Sword will be more useful than what I could produce on my own.  Therefore, 
I've decided to join the Sword development project. My first priority is to 
make a few improvements to the searching mechanism in Sword. I am writing to 
the list to get feedback while I am still in the planning and early 
implementation stages of my work.

The first area that I will be working on is adding a new type of search to 
Sword. The new search type will be based on typical boolean search operations 
(AND, OR, NOT,and maybe XOR using the operators &, |, !, and ^ respectively). 
Grouping with parenthases will be supported. For example, (God & (Father | 
Son | Spirit)) will give you all of the verses that have the word "God" and 
at least one of the words "Father", "Son" and "Spirit". Both word and phrase 
search terms will be supported within the same search expression. For 
example, (Jesus & "son of God") will find all verses with both the word and 
the phrase in them. I will also be adding a specialized AND operator that 
considers verse proximity. For example, ("lamb of God", Jesus, "take away", 
sins @3) will find all combinations of verses within 3 of each other that 
have all the search terms in them. This could be one verse that has all the 
search terms or any set of n verses (where n <= the number of search terms), 
each with one or more of the search terms, such that the two verses in the 
set that are fartest apart do not have more than two verses in between. I 
will also allow simple wildcards. I'm not sure how simple or complex that 
will be yet but at a minimum will allow something like (Jesus & lov*) which 
will find love, loving, etc. All of the above functions will be useable 
within one search expression. For example: 
((one*,"a phrase",two@2) ^ (three & !(four | five)). I'm not certain anyone 
would ever need a search expression of that complexity but it just gives an 
example of what will be possible. I intend this search functionality to be 
practical superset of the existing search types. It won't be exactly a 
superset since it won't have full regular expression support. However, I 
think that with the functionality available, regular expressions won't be 
necessary. If any of you can think of an example of something that you do 
with the current regular expression searching that won't be possible with 
what I described above, please let me know.

The second area that I will be working on is adding indexed searching where 
searching can be done on a precomputed index of search terms rather than the 
current mechanism where the whole Bible has to be read in from disk and 
searched in a brute force manner. This should decrease the search time to a 
very small fraction of what it currently is. One downside of indexed 
searching is that full regular expression searching isn't very feasible. I'll 
leave it as an exercise for the reader to verify that searching for /a.*b/ 
would be neither be very easy to implement nor very fast using an index 
(grin).

I would really appreciate all of the feedback I can get on this since I would 
like the searching capabilities of Sword to as strong as is reasonably 
possible. If you see any problems with what I am suggesting or if you have 
suggestions for other improvements to searching please send them to the list.

In Christ,

Joel Mawhorter