[sword-devel] can't do lucene Hebrew searches in KJV
DM Smith
dmsmith at crosswire.org
Thu Jan 20 10:30:22 MST 2011
On 01/20/2011 11:29 AM, Karl Kleinpaste wrote:
> BTW, belated thanx to Nic for pointing that out for us.
>
> I have to note that the Strong's content isn't zero-prefixed so as to
> generate exactly-5-digits entries, either. Gen 1:1...
>
> |<w lemma="strong:H07225">In the beginning</w> <w
> | lemma="strong:H0430">God</w> <w lemma="strong:H0853 strong:H01254"
> | morph="strongMorph:TH8804">created</w> <w lemma="strong:H08064">the
> | heaven</w> <w lemma="strong:H0853">and</w> <w lemma="strong:H0776">the
> | earth</w>.
>
> It's just an arbitrary, single, leading zero on all entries. Even Gen
> 2:24's use of H1 is encoded as H01.
>
> "sed -e 's/strong:H0/strong:H/g'" has a salutory and satisfying effect.
> I've just replaced my KJV content with the result of doing so. Much nicer.
>
> Interesting, that the similar encoding is not present for the NT Greek,
> so no such fix is needed.
I find this interesting as the keeper of the KJV module.
Going back to the baseline of the current effort (i.e. the KJV2003
project) the encoding has not changed.
Since this is the first that I have heard of the problem, I'm guessing
that a change in the SWORD engine has produced a regression? Looking at
the code, I don't see anything out of the ordinary. To search, the user
has to supply the Strong's number exactly as it is in the module. It
looks like it has been this way "forever".
For a search to work, the search request and the stored key need to be
the same. In JSword, we satisfy this by normalizing the Strong's number
when constructing the Lucene index. We normalize the user's request the
same way.
Also when displaying the Strong's number we apply a normalization too.
No sense in the user seeing the internal representation.
So, it seems to me that the question is: What is the proper way to fix
the problem?
In Him,
DM
More information about the sword-devel
mailing list