[jsword-devel] Search Highlight

DM Smith dmsmith at crosswire.org
Wed Apr 29 05:08:07 MST 2009


On Apr 29, 2009, at 4:06 AM, Tonny Kohar wrote:

> Hi,
>
> On Mon, Apr 27, 2009 at 4:42 PM, Tonny Kohar <tonny.kohar at gmail.com>  
> wrote:
>>
>> My initial finding seem there is no need for API change, what it  
>> needs
>> is simple a new package eg: o.c.jsword.index.highlight
>> and inside that package there is a static/factory/builder Highlight
>> class which accept either (raw text, OSIS xml, or html output).
>> ....
>
> After tried that approach for few days, sorry that one does not works
> :), since it is not based on the original index.
>
> So I tried another approach by patching the LuceneIndex and trying to
> store (in array/map) the Doc.FieldContent which matched the query, to
> avoid changing the index structure (eg: adding positional field). And
> applying that into the osis xml text, this approach does not work
> either (or have difficulty mapping it back) if the query is phrase
> because of the osis xml structure eg: a word can be surrounded with
> lemma/strong tag, but this works for fuzzy search (non phrase)
>
> So I guess I will another approach in the next few days, if still
> couldn't get it right then I guess I will put aside that search
> highlight due to the complexity involved and revisit later.

Tonny,

I am excited that you are working on this!

I don't have a problem with changing the structure of the lucene index  
to include information that is useful. At some point, the index will  
change.

We have the start of a mechanism to track the change so that the UI  
can notify a user properly that a index rebuild is advisable. Here are  
some scenarios:
1) More features are available in the engine. E.g. a new field can be  
searched. Highlighting can be done. ...
2) The index has changed (like in the case of adding position/offset)
3) The underlying engine, lucene, has changed.

The class o.c.j.index.lucene.IndexMetadata is used to track the  
current implementation levels. It is used in only one place in JSword.

The class needs to be moved to o.c.j.index and the method  
getLuceneVersion needs to be changed to getImplementationVersion and a  
getImplemenationName needs to be added, in case we need to change  
engine.

When an index is created, this metadata needs to be stored with the  
index.


If the implementation version has changed, it may need to be rebuilt.  
The number is 3 part: Major.minor.point. If the major has changed,  
then it needs to be rebuilt. If the minor has changed, it'd probably  
be best to rebuild. If the point has changed, it doesn't need to be  
rebuilt.

When an index is used, if the index version has changed, then the  
index needs to be rebuilt. It'd probably be good to go to the  
major.minor.point notation for our releases.

Right now, the index version is a simple incrementing number, starting  
at 1.1, but it could change easily, give that there is one use of it.

IndexManager should be changed to ask for a recommendation on whether  
to rebuild the index. From the above, it should be a multi-level answer.

Another thought: it might be nice to have a feature list with index  
version numbers. For example, the one place that uses the class is to  
check to see if the index version number is >= is for a feature that  
was added at a particular point. So if we add a language analyzer, say  
Indonesian, that would be present at say, 1.3, but not earlier. So if  
an Indonesian Bible had been indexed before 1.3, it can now be indexed  
better.

In Him,
	DM





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20090429/d4578f0c/attachment.html>


More information about the jsword-devel mailing list