[jsword-devel] Fwd: Search Index Downloading
Troy A. Griffitts
scribe at crosswire.org
Tue Oct 12 21:29:18 MST 2004
Thinking about it more and more makes me think this isn't anything we want
to manage (at least I don't want to manage). Talking about it with some
of the c++ guys, I had comments like:
Many places in the world don't have the bandwidth to download 6.8
megs of index for the KJV (current clucene index size). [Much less all
these other indecies you may want to add.]
Can't they [the jsword guys] nice a process [spawn a thread] to
index in the background?
____ end of comments ____
I tend to agree. The size and management headache aren't worth the 5
minute savings for the user. And if we can get the 5 minutes down to 2
and the 2 in the background and not noticed much at all, I think our time
is better spent on such.
-Troy.
On Tue, 12 Oct 2004, DM Smith wrote:
> The basic issues that I see are:
> 1) As Lucene is upgraded it may invalidate an index built against an earlier
> version.
> 2) If an upgraded Lucene is backwardly compatible, we still may want to
> re-index to get more features.
> 3) If a module is upgraded we will need to re-index (as you pointed out.)
> 4) As we create indexes for other features (e.g. transliteration of Greek and
> Hebrew; removal of accents, breathing, diacriticals, ...) , these will be
> subject to the same issues.
> 5) Old indexes need to be retained according to some reasonable policy. At
> any given time we may need to support more than one version of the index.
> 6) An index should not be made visible until it is completed. (e.g. build in
> an alternate directory and then rename the directory when it is finished)
>
> All this seems to point to is that versioning of the indexes is necessary and
> will need to be well thought out.
>
> I think we will certainly need to have a metadata describing the index. It
> may be possible to use path names to do this.
> It should contain sufficient version information to tie it to a particular
> version of Lucene, to the versions of Sword and JSword that can use it, and
> to the particular version of the module.
> If we maintained a checksum for the module, we could probably automate the
> re-indexing of modules. From the server logs, we can probably figure out a
> good (idle) time to do it.
>
> Troy A. Griffitts wrote:
>
>> Hey guys,
>> I'd like to do some experiments to see if clucene and Java Lucene
>> indecies are binary compatible.
>>
>> I also like the idea of a subdirectory under idx for keeping different
>> kinds of indecies. I might suggest even 1 more level under L1, if you are
>> planning for version changes of your index structure.
>>
>> e.g. C++ SWORD supports a pluggable index architecture, and we are
>> hoping to write some cool indexers for morphologically declined searches,
>> etc. We could keep pre-generated index sets under different
>> subdirectories under idx for each plugin.
>>
>> On the downside, we release updated modules on a regular basis-- some
>> modules more 'regular' than others. To keep the indecies up to date for
>> each module should not be the module creators responsibility. I wouldn't
>> expect our current maintainers to run a number of different indexers every
>> time they release a new module, unless the process was nearly completely
>> automated to handle ALL types of indexing.
>>
>> Up until this consideration, we have always taken the methodology of
>> generating anything needed for a plugin on demand on the end user's
>> system. Which is always the least maintenance option for us :)
>>
>> -Troy.
>>
>>
>> On Mon, 11 Oct 2004, Joe Walker wrote:
>>
>>> Getting Reply and ReplyAll confused again ...
>>>
>>> ---------- Forwarded message ----------
>>> From: Joe Walker <joseph.walker at gmail.com>
>>> Date: Mon, 11 Oct 2004 08:12:18 +0100
>>> Subject: Re: Search Index Downloading
>>> To: "Troy A. Griffitts" <scribe at crosswire.org>
>>>
>>> How about we use /pub/sword/raw/idx/L1/[book].zip then?
>>> If Java Lucene indexes and CLucene indexes are compatible then it
>>> won't be proprietary to JSword. If they are not compatible, or if you
>>> want to use different options in creating the index then you can use
>>> /pub/sword/raw/idx/C1/[book].zip or something.
>>>
>>> Joe.
>>>
>>>
>>>
>>> On Sun, 10 Oct 2004 22:12:00 -0700, Troy A. Griffitts
>>> <scribe at crosswire.org> wrote:
>>>
>>>> Hey Joe,
>>>> That's fine. Let me know if there is anything I need to do
>>>> for you.
>>>> Don't we have a /pub/jsword directory for your stuff? I understand
>>>> what
>>>> you mean by having the same base directory for modules (which would be
>>>> /pub/sword/raw for our server, so maybe /pub/sword/raw/idx, but this
>>>> isn't a sword module data structure. This is jsword's proprietary (in
>>>> the sense of not publicly sword declared) data. It would be nice to
>>>> unify a common index format for sword modules.
>>>>
>>>> Does it really take lucene 5+ minutes to generate? That's a
>>>> bummer.
>>>> You would think it wouldn't take much longer than a single non-index
>>>> search thru the Bible.
>>>>
>>>> To belatedly answer your question on sword-devel, I honestly
>>>> have no
>>>> idea if clucene indecies are binary compatible with the java lucene
>>>> counterpart.
>>>>
>>>> -Troy.
>>>>
>>>>
>>>>
>>>>
>>>> Joe Walker wrote:
>>>>
>>>>> Hi Troy,
>>>>>
>>>>> I'd like to allow users of Bible Desktop to download search indexes
>>>>> because they take about 5 mins to generate. A search index is
>>>>> between
>>>>> 2-3Mb per book so it ought not to take up too much space.
>>>>>
>>>>> Ideally we would use an FTP directory on crosswire something like:
>>>>> - /pub/sword/search/jsword/L1/[book].zip
>>>>>
>>>>> It starts /pub/sword so that if the beta modules site (or other
>>>>> download sites) come online we can just remember one root path per
>>>>> module site. The search/jsword bit would keep our stuff from getting
>>>>> in anyone elses way. L1 is simply a version number so we can update
>>>>> the index format without huge turmoil.
>>>>>
>>>>> Is that OK?
>>>>> Thanks,
>>>>>
>>>>> Joe.
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> jsword-devel mailing list
>>> jsword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>
>> _______________________________________________
>> jsword-devel mailing list
>> jsword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>
>
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel
>
More information about the jsword-devel
mailing list