[jsword-devel] Lucene Indexes

DM Smith dmsmith555 at yahoo.com
Sun May 25 22:37:03 MST 2008


Mullins, Steven wrote:
> DM,
>
> I'm still learning Lucene lingo so please bear with me.
> Right now the target for a search is verse.  This can
> be expanded using the ~n syntax.  Can the search target
> be reduced to a word?  If it could, then the lex-rob
> search would only find words where both conditions 
> match.  Perhaps then only the verses which contain the
> matched words could be returned. 
>   
You can search for a single word by just searching for that word. If you 
require two words, you can prefix both with + (like in Google) to 
require both. Also, you can use the AND connector. The problem is 
finding parallel values in different fields. For example, the third 
lemma and the third morph code.

Hope this helps.

In Him,
    DM
> I was composing a message for the java-user lucene
> list when question occured to me.
>
> Thanks,
>
> Steve
>
>
> -----Original Message-----
> From: DM Smith
> Sent: Wed, May 21, 2008 1:33 PM
> To: J-Sword Developers Mailing List
> Subject: Re: [jsword-devel] Lucene Indexes
>
>
> Mullins, Steven wrote:
>   
>> I have jsword indexing the lexical forms and the robinson codes
>> for the MorphGNT module.  The syntax is:
>>
>> rob:n-nsm && lex:?????
>>   
>>     
> Hmm. I thought that robinson morphology was already handled by JSword by 
> stuffing it in morph:
>
>   
>> This will search for all verses with the lexical form "?????"
>> and the robinson morphological code n-nsm.  However, "?????"
>> can be anywhere in the verse and the "n-nsm" tag can apply to
>> any word in the verse.  I'd like to restrict the search so that
>> the robinsons search applies only to a particular word.  For
>> example:
>>
>> (lex:????? WITH rob:n-nsm)
>> Translation: Search all words with lexical form "?????", which 
>> also has a robinson's code of "n-nsm".  
>>
>> I don't know how or if Lucene establishes the relationship between
>> fields.  Is there a way to establish a link between the <content>
>> field and <lex> and <rob> field? 
>>   
>>     
> This would be a great question for the lucene-users mailing list.
>
> As far as I know, this has not been done. But, it appears that there is 
> enough information held in the index to perform such a search.
>
> That is, each term (token?) in the index is tied to it's offset and 
> length in the text and each is given position. For each field, the first 
> term would be 1, the next 2, .... Thus, two fields can be parallel arrays.
>
> Also, it is possible to fudge the position increment, such as when the 
> <w> element is being processed to have each word that is stuffed into 
> the content field, have the same position per <w> element.
>
> This would provide morph:, lex:, content:, .... a way to be connected in 
> parallel.
>
> The other way, would be to think of each field as a table in a database, 
> indexed by document number and ignore the whole notion of position.
>
> Then, one would create fields for relationships, so the morph <-> lex 
> relationship would be held in a morph_lex: field and searched as such.
>
> Then one could search:
> morph_lex:("xxx n-nsm")
>
> The obvious problem with this is one can only exploit relationships that 
> are explicitly defined, while the first solution is more general.
>
> The trick would be to synthesize combo search expressions on the fly.
>   
>> Perhaps this is already done, but if so, I do not know the syntax
>> to employ it.
>>
>> Thanks,
>>
>> Steve
>>
>> -----Original Message-----
>> From: DM Smith 
>> Sent: Mon, May 19, 2008 12:00 PM
>> To: J-Sword Developers Mailing List
>> Subject: Re: [jsword-devel] Lucene Indexes
>>
>>
>> Mullins, Steven wrote:
>>   
>>     
>>> The beauty of the MorphGNT module is that the analysis is already done!
>>> for every inflected word, you have tagged the lexical form (to search by)
>>> and the morpological tag (to narrow a search).  So for example if I wanted
>>> to to search for all verses with "believe" in the first person active
>>> indicative with "Jesus" as a direct object, I could, if only I had the 
>>> lexical form and morph tags in lucene working.
>>>
>>> Just my 2-cents.
>>>
>>> Steve
>>>   
>>>     
>>>       
>> I think it's worth more than a couple of cents!
>>
>> One of the thing's that we have on our todo list (Jira's down 
>> indefinitely, so don't bother looking) is to create a Strong's index 
>> that could be used for any module. So if anyone ever had strong:(H3068 
>> AND H3069) in a search, they would find the verse in their favorite text.
>>
>> We could do something similar with the analysis of MorphGNT.
>>
>> BTW, I welcome contributions as I'll be focusing on RTL issues, 
>> translations into other languages and Bookmarks.
>>
>> In Him,
>>     DM
>>     





More information about the jsword-devel mailing list