[jsword-devel] Lucene Indexes

DM Smith dmsmith555 at yahoo.com
Wed May 21 10:32:44 MST 2008


Mullins, Steven wrote:
> I have jsword indexing the lexical forms and the robinson codes
> for the MorphGNT module.  The syntax is:
>
> rob:n-nsm && lex:?????
>   
Hmm. I thought that robinson morphology was already handled by JSword by 
stuffing it in morph:

> This will search for all verses with the lexical form "?????"
> and the robinson morphological code n-nsm.  However, "?????"
> can be anywhere in the verse and the "n-nsm" tag can apply to
> any word in the verse.  I'd like to restrict the search so that
> the robinsons search applies only to a particular word.  For
> example:
>
> (lex:????? WITH rob:n-nsm)
> Translation: Search all words with lexical form "?????", which 
> also has a robinson's code of "n-nsm".  
>
> I don't know how or if Lucene establishes the relationship between
> fields.  Is there a way to establish a link between the <content>
> field and <lex> and <rob> field? 
>   
This would be a great question for the lucene-users mailing list.

As far as I know, this has not been done. But, it appears that there is 
enough information held in the index to perform such a search.

That is, each term (token?) in the index is tied to it's offset and 
length in the text and each is given position. For each field, the first 
term would be 1, the next 2, .... Thus, two fields can be parallel arrays.

Also, it is possible to fudge the position increment, such as when the 
<w> element is being processed to have each word that is stuffed into 
the content field, have the same position per <w> element.

This would provide morph:, lex:, content:, .... a way to be connected in 
parallel.

The other way, would be to think of each field as a table in a database, 
indexed by document number and ignore the whole notion of position.

Then, one would create fields for relationships, so the morph <-> lex 
relationship would be held in a morph_lex: field and searched as such.

Then one could search:
morph_lex:("xxx n-nsm")

The obvious problem with this is one can only exploit relationships that 
are explicitly defined, while the first solution is more general.

The trick would be to synthesize combo search expressions on the fly.
> Perhaps this is already done, but if so, I do not know the syntax
> to employ it.
>
> Thanks,
>
> Steve
>
> -----Original Message-----
> From: DM Smith [mailto:dmsmith555 at yahoo.com]
> Sent: Mon, May 19, 2008 12:00 PM
> To: J-Sword Developers Mailing List
> Subject: Re: [jsword-devel] Lucene Indexes
>
>
> Mullins, Steven wrote:
>   
>> The beauty of the MorphGNT module is that the analysis is already done!
>> for every inflected word, you have tagged the lexical form (to search by)
>> and the morpological tag (to narrow a search).  So for example if I wanted
>> to to search for all verses with "believe" in the first person active
>> indicative with "Jesus" as a direct object, I could, if only I had the 
>> lexical form and morph tags in lucene working.
>>
>> Just my 2-cents.
>>
>> Steve
>>   
>>     
>
> I think it's worth more than a couple of cents!
>
> One of the thing's that we have on our todo list (Jira's down 
> indefinitely, so don't bother looking) is to create a Strong's index 
> that could be used for any module. So if anyone ever had strong:(H3068 
> AND H3069) in a search, they would find the verse in their favorite text.
>
> We could do something similar with the analysis of MorphGNT.
>
> BTW, I welcome contributions as I'll be focusing on RTL issues, 
> translations into other languages and Bookmarks.
>
> In Him,
>     DM
>
>
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel
>
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel
>
>   





More information about the jsword-devel mailing list