[jsword-devel] Lucene 2.9 and JSword

Tonny Kohar tonny.kohar at gmail.com
Mon Nov 23 09:28:03 MST 2009


Hi,

As I noticed JSword is now using Apache Lucene 2.9, and I was reading
an interesting article regarding Lucene 2.9 especially the "Term
Vector based highligther" as follow

"Term vector-based highlighter: a new term highlighter implementation based on
term vectors (essentially a view of terms, offsets, and positions in a documents
field). It supports features like N-Gram fields and phrase-unit
highlighting with
slops and yields good performance on large documents. The downside is that it
requires a lot more disk space due to stored term vectors."

then, I was thinking is this new feature from Lucene 2.9 can be used
to provide JSword search highlight features ?

The reason I ask this because I do not know much regarding Lucene 2.9,
and because it seem easy enough (correct me, if I am wrong, the hard
work has been provided by lucene itself) just add the word/term offset
to the index then retrieve back during search, and apply the highlight
to the output html/xml.

The question are:
- is my assumption correct ?
- is it can be used for languange other than english ?
- does UTF-8 (or the text encoding used by crosswire module) allow
offset/byte counting ?

Cheers
Tonny Kohar
--
Alkitab Bible Study
http://www.kiyut.com



More information about the jsword-devel mailing list