[jsword-devel] Search Highlight
DM Smith
dmsmith at crosswire.org
Wed Apr 29 05:08:07 MST 2009
On Apr 29, 2009, at 4:06 AM, Tonny Kohar wrote:
> Hi,
>
> On Mon, Apr 27, 2009 at 4:42 PM, Tonny Kohar <tonny.kohar at gmail.com>
> wrote:
>>
>> My initial finding seem there is no need for API change, what it
>> needs
>> is simple a new package eg: o.c.jsword.index.highlight
>> and inside that package there is a static/factory/builder Highlight
>> class which accept either (raw text, OSIS xml, or html output).
>> ....
>
> After tried that approach for few days, sorry that one does not works
> :), since it is not based on the original index.
>
> So I tried another approach by patching the LuceneIndex and trying to
> store (in array/map) the Doc.FieldContent which matched the query, to
> avoid changing the index structure (eg: adding positional field). And
> applying that into the osis xml text, this approach does not work
> either (or have difficulty mapping it back) if the query is phrase
> because of the osis xml structure eg: a word can be surrounded with
> lemma/strong tag, but this works for fuzzy search (non phrase)
>
> So I guess I will another approach in the next few days, if still
> couldn't get it right then I guess I will put aside that search
> highlight due to the complexity involved and revisit later.
Tonny,
I am excited that you are working on this!
I don't have a problem with changing the structure of the lucene index
to include information that is useful. At some point, the index will
change.
We have the start of a mechanism to track the change so that the UI
can notify a user properly that a index rebuild is advisable. Here are
some scenarios:
1) More features are available in the engine. E.g. a new field can be
searched. Highlighting can be done. ...
2) The index has changed (like in the case of adding position/offset)
3) The underlying engine, lucene, has changed.
The class o.c.j.index.lucene.IndexMetadata is used to track the
current implementation levels. It is used in only one place in JSword.
The class needs to be moved to o.c.j.index and the method
getLuceneVersion needs to be changed to getImplementationVersion and a
getImplemenationName needs to be added, in case we need to change
engine.
When an index is created, this metadata needs to be stored with the
index.
If the implementation version has changed, it may need to be rebuilt.
The number is 3 part: Major.minor.point. If the major has changed,
then it needs to be rebuilt. If the minor has changed, it'd probably
be best to rebuild. If the point has changed, it doesn't need to be
rebuilt.
When an index is used, if the index version has changed, then the
index needs to be rebuilt. It'd probably be good to go to the
major.minor.point notation for our releases.
Right now, the index version is a simple incrementing number, starting
at 1.1, but it could change easily, give that there is one use of it.
IndexManager should be changed to ask for a recommendation on whether
to rebuild the index. From the above, it should be a multi-level answer.
Another thought: it might be nice to have a feature list with index
version numbers. For example, the one place that uses the class is to
check to see if the index version number is >= is for a feature that
was added at a particular point. So if we add a language analyzer, say
Indonesian, that would be present at say, 1.3, but not earlier. So if
an Indonesian Bible had been indexed before 1.3, it can now be indexed
better.
In Him,
DM
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20090429/d4578f0c/attachment.html>
More information about the jsword-devel
mailing list