[sword-devel] Suggestion

Trevor Jenkins sword-devel@crosswire.org
Fri, 24 Dec 1999 20:28:30 +0000


On Friday, 24 December, 1999 14:09:11, Martin Gruner <mg.pub@gmx.net> made a
brave suggestion:

> I have been on the sword-devel list for quite a while.
> I am neither a theologian (not yet) nor a skilled programmer.

I am not a theologian either (except in SGML) but I am an experienced
programmer and a (text) database expert.

> But i wanted to send you my (perhaps very stupid) ideas. So if this is trash,
> just delete the mail and forget it.
>
> As I understood, sword has its texts stored in different formats (rawtext,
> rawcom, rawgbf...) in text files and those .vss index files.

It does concern me that there are several different formats. And do I have
it right that these formats are machine and/or operating system specific. As
I've got a Linux machine, two Apple macs and a few Windows PC on my home
intranet I would like to have the SWORD files kept on my Linux machine,
where I have most disk space, and access the one copy from my Macs or PCs
(via Appletalk or samba).

> Somebody on the list wished to get rid of indexing.

Without wanting to hurt "somebody"'s felling that could be a stupid idea.
Whilst creating an index file is expensive it is a one-time expense.
However, to extract the same information several time is even more
expensive. I realise that I'm biased in this assertion but I have worked in
text and document searching for 20+ years. Scanning text for the same words
twice is going to do be more expensive than doing it once and keeping the
(partial) results around. The index files are those partial results. If they
take up additional disk space I don't mind what I do mind is waiting for my
searches to complete.

> Would it be worth to consider the following thing:
> Why not build database files from the modules that could be accessed through
> SQL for example(i think no indexing would be necessary then). That should
> provide an easy interface for complex searches, and remove the necessity of
> different formats within sword (though it would also be possible to store the
> original formats inside the databases).

A single storage format would be a "database". For reasons too complex to go
into detail at the moment SQL is NOT a good choice for a search language
where text is involved.

> Now you will have recognized that I do not really know what I am writing of
> (unicode, sql....) I'm sorry for disturbing you if this is only trash, but i
> did not get rid of the thoughts.

You have raised some very serious issues. It takes the naive to shame the
wise---now where have I heard something like that? ;-)

The only thing I would argue with is a database system. This adds another
dependenices that some users might not be able to satisify. We should not,
for example, assume that Microsoft Windows users have Access installed or
SQL Server. (As an aside do you know that Microsoft don't use SQL Server as
the text storage engine in their Exchange Server product? They had to write
a text-friendly sub-system instead.) And on UNIX platforms what database
should we plump for? Maybe mySQL or GNU SQL (but is that complete yet?), or
Postgress, or university Ingress, or a commercial system such as DB/2,
Adabas-D, or etc? My preference is for a general purpose library that SWORD
and bibletime can build upon rather than presuming the existence of a
traditional database system, which in anycase would have difficulty with
free text.

Just my 0.2c but I've earnt those the hard way by working with text systems
for 20+ years. After that time I know a thing or two about how it can be
done.

Regards, Trevor

British Sign Language is not inarticulate handwaving; it's a living
language. So recognise it now.

--

<>< Re: deemed!