[jsword-devel] Fwd: Search Index Downloading

DM Smith dmsmith555 at yahoo.com
Tue Oct 12 18:43:45 MST 2004


The basic issues that I see are:
1) As Lucene is upgraded it may invalidate an index built against an 
earlier version.
2) If an upgraded Lucene is backwardly compatible, we still may want to 
re-index to get more features.
3) If a module is upgraded we will need to re-index (as you pointed out.)
4) As we create indexes for other features (e.g. transliteration of 
Greek and Hebrew; removal of accents, breathing, diacriticals, ...) , 
these will be subject to the same issues.
5) Old indexes need to be retained according to some reasonable policy. 
At any given time we may need to support more than one version of the index.
6) An index should not be made visible until it is completed. (e.g. 
build in an alternate directory and then rename the directory when it is 
finished)

All this seems to point to is that versioning of the indexes is 
necessary and will need to be well thought out.

I think we will certainly need to have a metadata describing the index. 
It may be possible to use path names to do this.
It should contain sufficient version information to tie it to a 
particular version of Lucene, to the versions of Sword and JSword that 
can use it, and to the particular version of the module.
If we maintained a checksum for the module, we could probably automate 
the re-indexing of modules. From the server logs, we can probably figure 
out a good (idle) time to do it.

Troy A. Griffitts wrote:

> Hey guys,
>     I'd like to do some experiments to see if clucene and Java Lucene 
> indecies are binary compatible.
>
>     I also like the idea of a subdirectory under idx for keeping 
> different kinds of indecies.  I might suggest even 1 more level under 
> L1, if you are planning for version changes of your index structure.
>
>     e.g. C++ SWORD supports a pluggable index architecture, and we are 
> hoping to write some cool indexers for morphologically declined 
> searches, etc.  We could keep pre-generated index sets under different 
> subdirectories under idx for each plugin.
>
>     On the downside, we release updated modules on a regular basis-- 
> some modules more 'regular' than others.  To keep the indecies up to 
> date for each module should not be the module creators 
> responsibility.  I wouldn't expect our current maintainers to run a 
> number of different indexers every time they release a new module, 
> unless the process was nearly completely automated to handle ALL types 
> of indexing.
>
>     Up until this consideration, we have always taken the methodology 
> of generating anything needed for a plugin on demand on the end user's 
> system.  Which is always the least maintenance option for us :)
>
>     -Troy.
>
>
> On Mon, 11 Oct 2004, Joe Walker wrote:
>
>> Getting Reply and ReplyAll confused again ...
>>
>> ---------- Forwarded message ----------
>> From: Joe Walker <joseph.walker at gmail.com>
>> Date: Mon, 11 Oct 2004 08:12:18 +0100
>> Subject: Re: Search Index Downloading
>> To: "Troy A. Griffitts" <scribe at crosswire.org>
>>
>> How about we use /pub/sword/raw/idx/L1/[book].zip then?
>> If Java Lucene indexes and CLucene indexes are compatible then it
>> won't be proprietary to JSword. If they are not compatible, or if you
>> want to use different options in creating the index then you can use
>> /pub/sword/raw/idx/C1/[book].zip or something.
>>
>> Joe.
>>
>>
>>
>> On Sun, 10 Oct 2004 22:12:00 -0700, Troy A. Griffitts
>> <scribe at crosswire.org> wrote:
>>
>>> Hey Joe,
>>>         That's fine.  Let me know if there is anything I need to do 
>>> for you.
>>> Don't we have a /pub/jsword directory for your stuff?  I understand 
>>> what
>>> you mean by having the same base directory for modules (which would be
>>> /pub/sword/raw for our server, so maybe /pub/sword/raw/idx, but this
>>> isn't a sword module data structure.  This is jsword's proprietary (in
>>> the sense of not publicly sword declared) data.  It would be nice to
>>> unify a common index format for sword modules.
>>>
>>>         Does it really take lucene 5+ minutes to generate?  That's a 
>>> bummer.
>>> You would think it wouldn't take much longer than a single non-index
>>> search thru the Bible.
>>>
>>>         To belatedly answer your question on sword-devel, I honestly 
>>> have no
>>> idea if clucene indecies are binary compatible with the java lucene
>>> counterpart.
>>>
>>>         -Troy.
>>>
>>>
>>>
>>>
>>> Joe Walker wrote:
>>>
>>>> Hi Troy,
>>>>
>>>> I'd like to allow users of Bible Desktop to download search indexes
>>>> because they take about 5 mins to generate. A search index is between
>>>> 2-3Mb per book so it ought not to take up too much space.
>>>>
>>>> Ideally we would use an FTP directory on crosswire something like:
>>>> - /pub/sword/search/jsword/L1/[book].zip
>>>>
>>>> It starts /pub/sword so that if the beta modules site (or other
>>>> download sites) come online we can just remember one root path per
>>>> module site. The search/jsword bit would keep our stuff from getting
>>>> in anyone elses way. L1 is simply a version number so we can update
>>>> the index format without huge turmoil.
>>>>
>>>> Is that OK?
>>>> Thanks,
>>>>
>>>> Joe.
>>>
>>>
>>>
>> _______________________________________________
>> jsword-devel mailing list
>> jsword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel
>



More information about the jsword-devel mailing list