[sword-devel] search idea
Jerry Hastings
sword-devel@crosswire.org
Tue, 28 Dec 1999 22:25:23 -0700
At 01:59 PM 12/28/1999 +0000, Trevor Jenkins wrote:
>> Please give some thought to adding a synonym feature to searches.
><snip>
>A thesaurus feature does not affect the indexing of the Bible texts, books,
>dictionaries or any other material that might be of interest to us as
>searchers. Primarily it is part of the search language. So instead of saying
>
> FIND sardius* or topaz* or carbuncle* or emerald* or sapphire* or
> diamond* or jacinth* or agate* or amethyst* or beryl* or onyx* or
> jasper* or carnelian* or chryso*
>
>I could have said
>
> FIND NT(jewel)
I have also at times given thought to the type of searching you are
suggesting. It would be nice if we could have support for that kind of
thing. What I was suggesting was not that though. Being able to do the
NT(jewel) search would be a nice feature. However, the thesaurus search
will generate more hits than the one I was suggesting, and may need more
input from the user. That is both good and bad depending on what you want.
Take the cow/kine example. To use a thesaurus may require the user to
select how closely related words must be to be a hit. I would like to be
able to do that. It could be that what I really wanted to search for was
any kind of "cattle" and seeing that in the thesaurus would help to keep me
from missing other words that refer to cattle. With the search index I
suggested, a search for "cow" will only produce hits where someone
translated the Greek or Hebrew to "cow", and it would not require the user
to select a range of meaning. But searching for "cow" will generate a lot
more hits than searching for "kine" or "heifer" because BBE used only "cow"
for these words, but KJV used all three. In that example the thesaurus
search may be better. For another example, try "lamp." A thesaurus would
probably also search for "light." And while a lamp is a light, light does
not have to be a lamp. If the index being search was only produced from
the BBE, which is kind of word poor, you can not match "lamp." A thesaurus
search for "lamp" would probably have to hit on "light." However, if the
index was produced by indexing as many translations as possible, "lamp"
should produce hits on "lamp", even if the verses are displayed from BBE.
>One serious issue is how are the thesaurus links to be established (i.e.
>that carnelian is a narrower term of jewel or that kine is a synonym for
>cow.)
That is one of the benefits of just searching multiple indexes or just
combining many into one, the translators have done the work for us. But for
a thesaurus, Rogets early editions are now PD and may provide a place to
start.
>An automated solution might be possible if every Bible text was marked up
>with Strong's numbers.
With indexes of the Greek and Hebrew texts, one could do a combined Greek,
Hebrew and English (or whatever) thesaurus. As long as you can search more
than one index at a time this should work without having to have Strongs
numbers in the text.
>> One way to do this is to search every bible index not just the index for
>> the currant module. A better way is to create an index that combines
>> entries from other indexes.
>
>I'd prefer to keep this as separate indices. I know it's feasible to have
>multiple inverted files open and use them in parallel.
The reason to combine, is so you don't have to have a thesaurus or a lot of
indexes. And if you only have the indexes that came with the translations
you downloaded and don't have a thesaurus you can't get hits on the
translations you don't have. With a combined index you only need to have
one translation. I can think of some people that would only have one. But
they would be able to get hits on the translations they don't have. Though,
those hits will be displayed in the translation they have. And the combined
index will be smaller than having a lot of indexes because we dump
duplicate information in the combined index. Combining two indexes is not
nearly as big as two and increases less with each index combined with it.
And because duplicate information was dumped when the index was made you
don't have to resolve duplicate hits when doing the search.
>> An index of English words could be created by
>> taking the data from all indexes of English Bibles.
>
>Being a European (one of that rare breed of an Englishman who can speak more
>than English and its American dialect) I would not restrict this feature to
>English only translations. I might for example search the RSV, KJV, NIV
>English translations against Swedish, Romanian and BSL translations.
I would not restrict it either. The English index is just an example of
what could be done. I don't think one size would fit all.
>Whilst technically possible this is somewhat "dangerous". It could become
>devisive due to differing theological assumptions.
Even translations and commentaries are divisive. The thing to do is provide
support for the format and features. Then others can take the heat for the
content. And the different camps can make their own indexes to the
specification.
>> Some other indexes that would be great to have are indexes of mood,
>> setting, topic, and action.
>
>Interesting idea. However, when the topic of mood etc comes up on the Bible
>Greek and Bible Hebrew (academic) lists the conversation goes on for weeks
>without any real resolution as to the mood of specific verses.
Mood may have been a bad choice of words. I was thinking of something like
disposition. Mood in the literary sense. Not mood in the grammatical sense
which I would guess those academics mean. And yes, some things would be
very subjective. But to me that can be better than nothing at all. No one
would be forced to use it. With the developments in markup, like ThML, we
may end up with a lot of markups we can index too. (Paul, are you there?
Can you think of any ThML markups you would like indexed?)
>A "better" solution might be to provide hooks for private thesaurus files so
>that individuals and groups could create their own classification schemes.
Yes. An open specification.
>I'll work on the inclusion of a thesaurus feature as part of the draft
>specification.
I am excited! A thesaurus would be great. And if you can, please include
support for doing Bible searches from more than just the index that comes
with a text module. Multiple index searches and searches of custom indexes
(made to the specification but with custom data).
Jerry Hastings