[sword-devel] Suggestion

Trevor Jenkins sword-devel@crosswire.org
Tue, 04 Jan 2000 14:51:01 +0000


On Tuesday, 4 January, 2000 03:26:42, darwin@ichristian.com 
<darwin@ichristian.com> wrote:

> Trevor Jenkins wrote:
>
>> An even more idle thought I've just had is to construct the indices on the
>> first occasion that the user looks for anything.
>
> I would think that it would be far more costly to construct an inverted
> file than reading sequentially through a file, thereore imposing an
> incredible cost on the first query.  Is this an unfounded concern?

Of course, the initial cost of constructing the index files is costly. But
it will save time for any subsequent search. The benefit of the approach is
that the file has to be read sequentially at least one to build the index

> What would some of the issues be in simply having the inverted file created
> before staging for download, ...

I see this as preferable. It means that only system takes the performance
hit.

> and making it an optional part of the download?

As the index files will provide improved performance and features it
probably won't be optional for the majority of people.

> How about having it created as part of the installation procedure?

Second best to pre-stage creation.

> Are there issues that would require the inverted file to be created on the
> target system?

I am not aware of any.

> ...  Would a higher level of optimization be possible?

There are several "level of optimization" to be considered aspart of this
discussion. Firstly, have the correct algorithms been selected: for the
linear search, for the boolean search execution, for the construction of the
index. At this moment in time selecting a compiler's optimization level is
too low a level of detail.

One area where "optimisation" might give the existing linear search some
speed is to provide a strstr() based on an effecient string matching
algorithm. As it's a library routine we are at the mercy of someone else for
its performance. We just don't know what algorithm they selected, might be a
linear scan, Boyer-Moore, Knuth-Morris-Pratt or something they invented
themselves.

> Assuming a mid-range Intel based system (perhaps a 350 mhz Celeron) how
> long would it take to create an inverted file for an existing bible text?

That is still to be decided. However, in the first instance I'm looking for
good performance on my Linux workstation, which has a 486 processor
installed. There's also the I/O configuration to be considered, ie how fast
can the bible text be read from a disk and how fast can the index files be
written to a disk, is there to be only one disk or several, where are any
work files to be placed---locally or on an NFS mounted disk, how does the
operating system buffer the reads/writes.

> These are the first quesitons I would ask before deciding when to create
> the inverted file.

You are right to ask them. Some of them need more analysis before I can
answer them with precise timings.

Regards, Trevor

British Sign Language is not inarticulate handwaving; it's a living
language. So recognise it now.

--

<>< Re: deemed!