[sword-devel] Stem searching

Daniel Owens dcowens76 at gmail.com
Thu Jul 12 10:12:32 MST 2012


Troy,

I am excited about this kind of search capability. This is great work.

I have a question. Will this solution also cover searching for a morph 
value for any lemma? It might look like:

morph:*@mor1

instead of

morph:lem1 at mor1

In other words, if I want to find all the masculine, singular nouns, 
regardless of lemma.

Daniel

On 07/12/2012 10:01 AM, Troy A. Griffitts wrote:
> Hey Chris,
>
> A relational database will not contribute more to a solution than what 
> we have available in lucene.  What I failed to get across in my last 
> email, due to too much caffeine, was that a verse's declension data by 
> itself is useless without being attached to the lemma which each morph 
> code in the declension data modifies.
>
> We have 2 things for each word:
>
> root at declension
>
> we refer to these as:
>
> lemma at morph
>
> root, stem, lemma, in this discussion are all synonyms.
>
>
> Currently in our lucene index we have a field called 'lemma', so for a 
> verse with 5 words, this field might look something like this:
>
> lem1 lem2 lem3 lem4
>
> and we can do searches for all verses with lem3
>
> lemma:lem3
>
> great, but this ignores the declension data; e.g., was lem3 a 1st 
> person or 2nd person noun?  Ignoring declension is usually desired 
> when doing word studies, and why we have the 'lemma' lucene index in 
> the first place.  You don't want to have to search for all forms of a 
> word to do a word study.
>
> ... but sometimes you only care about 1 form of a word when doing a 
> study, so how do we incorporate the declension information?
>
> It would be useless to create a 'morph' field with contents for the 
> same verse as:
>
> mor1 mor2 mor3 mor4
>
> In this scenario, you could construct a clucene search using both 
> fields like this:
>
> lemma:lem2 morph:mor2
>
> but this would not return what you desire.  This would return all 
> verses which have a lem2 in the lemma field and a mor2 in the morph 
> field, but not necessarily together.
>
> So... the proposed solution...
> ++++++++++++++++++++++++++
>
> We have created a new field called 'morph' which will probably replace 
> the lemma field and has data as:
>
> lem1 at mor1 lem2 at mor2 lem3 at mor3 lem4 at mor4
>
> This allows a lucene search to be create like this:
>
> morph:lem2 at mor2
>
> or to get the functionality of the current 'lemma' field-- which 
> ignores declension, the equiv search using the 'morph' field would be:
>
> morph:lem2@*
>
> this allows all kinds of queries, like: give me all verses which have 
> lem1 and lem2 within 4 words of each other and lem2 must have the 
> declension mor2
>
> morph:"lem1@* lem2 at mor2"~4
>
> Hope this make things clearer if there were any clouds :)
>
> Troy
>
>
>
>
>
>
>
>
> On 07/12/2012 02:17 PM, Chris Burrell wrote:
>> Thanks Troy. That helps put the task in perspective... An alternative 
>> would possibly be to store both strong and morphology indexes in a 
>> relational database. Then have a table mapping all the data together. 
>> I guess the mapping table would be based on one version of the Bible 
>> only.
>>
>> Cheers
>> Chris
>>
>>
>> On 11 July 2012 01:09, Troy A. Griffitts <scribe at crosswire.org 
>> <mailto:scribe at crosswire.org>> wrote:
>>
>>     Chris,
>>
>>     We're toyed around with the best way to add lemma+morph searching
>>     in SWORD but haven't finalized anything yet.
>>
>>     Indexing Morphology codes won't helps.  This would give you 2
>>     fields which need to be used together.
>>
>>     For example, if you wish to find λογος only in the nominative
>>     within 3 words of any present, active, indicative, 2 persons
>>     singular or plural verb, you could not satisfy your search.
>>
>>     Believe it or not, end users of tools like Bibleworks seem quite
>>     happy to learn odd syntax like:
>>
>>
>>     "λογος@* *@PAI2?"~3
>>
>>
>>     Of course GUI tools to help build that syntax for them is also
>>     desired.
>>
>>     This it the direction we're heading, but would require lemma
>>     encoding changed from strongs to lexical form.
>>
>>     Presently we could nearly obtain this by building an index as
>>     (from the start of John 1.1):
>>
>>     G1722 at PREP G746 at N-DSF G2258 at V-IXI-3S
>>
>>     But this would require users to know strongs numbers rather than
>>     lexical form, which would almost certainly need a GUI to help
>>     them build the search syntax.
>>
>>     Hope this helps,
>>
>>     Troy
>>
>>
>>
>>
>>
>>     On 07/10/2012 11:41 PM, Chris Burrell wrote:
>>>     Hello
>>>
>>>     Does anyone know/tried some kind of stem search with JSword? Is
>>>     it implemented? Or would we need to do a bit more work there?
>>>
>>>     Chris
>>>
>>>
>>>
>>>     _______________________________________________
>>>     jsword-devel mailing list
>>>     jsword-devel at crosswire.org  <mailto:jsword-devel at crosswire.org>
>>>     http://www.crosswire.org/mailman/listinfo/jsword-devel
>>
>>
>>
>>
>>
>> _______________________________________________
>> jsword-devel mailing list
>> jsword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>
>
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page





More information about the sword-devel mailing list