<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Hey Chris,<br>
<br>
A relational database will not contribute more to a solution than
what we have available in lucene. What I failed to get across in
my last email, due to too much caffeine, was that a verse's
declension data by itself is useless without being attached to the
lemma which each morph code in the declension data modifies.<br>
<br>
We have 2 things for each word:<br>
<br>
root@declension<br>
<br>
we refer to these as:<br>
<br>
lemma@morph<br>
<br>
root, stem, lemma, in this discussion are all synonyms.<br>
<br>
<br>
Currently in our lucene index we have a field called 'lemma', so
for a verse with 5 words, this field might look something like
this:<br>
<br>
lem1 lem2 lem3 lem4<br>
<br>
and we can do searches for all verses with lem3<br>
<br>
lemma:lem3<br>
<br>
great, but this ignores the declension data; e.g., was lem3 a 1st
person or 2nd person noun? Ignoring declension is usually desired
when doing word studies, and why we have the 'lemma' lucene index
in the first place. You don't want to have to search for all
forms of a word to do a word study.<br>
<br>
... but sometimes you only care about 1 form of a word when doing
a study, so how do we incorporate the declension information?<br>
<br>
It would be useless to create a 'morph' field with contents for
the same verse as:<br>
<br>
mor1 mor2 mor3 mor4<br>
<br>
In this scenario, you could construct a clucene search using both
fields like this:<br>
<br>
lemma:lem2 morph:mor2<br>
<br>
but this would not return what you desire. This would return all
verses which have a lem2 in the lemma field and a mor2 in the
morph field, but not necessarily together.<br>
<br>
So... the proposed solution...<br>
++++++++++++++++++++++++++<br>
<br>
We have created a new field called 'morph' which will probably
replace the lemma field and has data as:<br>
<br>
lem1@mor1 lem2@mor2 lem3@mor3 lem4@mor4<br>
<br>
This allows a lucene search to be create like this:<br>
<br>
morph:lem2@mor2<br>
<br>
or to get the functionality of the current 'lemma' field-- which
ignores declension, the equiv search using the 'morph' field would
be:<br>
<br>
morph:lem2@*<br>
<br>
this allows all kinds of queries, like: give me all verses which
have lem1 and lem2 within 4 words of each other and lem2 must have
the declension mor2<br>
<br>
morph:"lem1@* lem2@mor2"~4<br>
<br>
Hope this make things clearer if there were any clouds :)<br>
<br>
Troy<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
On 07/12/2012 02:17 PM, Chris Burrell wrote:<br>
</div>
<blockquote
cite="mid:CACQnaRWBbwe4Pc87ZimDaEnB4=Yj_q++Ht_x8SwmJOsm1OY5tA@mail.gmail.com"
type="cite">Thanks Troy. That helps put the task in perspective...
An alternative would possibly be to store both strong and
morphology indexes in a relational database. Then have a table
mapping all the data together. I guess the mapping table would be
based on one version of the Bible only.
<div>
<div><br>
</div>
<div>Cheers<br>
Chris</div>
<div><br>
<br>
<div class="gmail_quote">On 11 July 2012 01:09, Troy A.
Griffitts <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:scribe@crosswire.org" target="_blank">scribe@crosswire.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>Chris,<br>
<br>
We're toyed around with the best way to add
lemma+morph searching in SWORD but haven't finalized
anything yet.<br>
<br>
Indexing Morphology codes won't helps. This would
give you 2 fields which need to be used together.<br>
<br>
For example, if you wish to find λογος only in the
nominative within 3 words of any present, active,
indicative, 2 persons singular or plural verb, you
could not satisfy your search.<br>
<br>
Believe it or not, end users of tools like Bibleworks
seem quite happy to learn odd syntax like:<br>
<br>
<font size="5"><span style="font-family:palatino
linotype"></span></font><br>
"λογος@* *@PAI2?"~3<br>
<br>
<br>
Of course GUI tools to help build that syntax for them
is also desired.<br>
<br>
This it the direction we're heading, but would require
lemma encoding changed from strongs to lexical form.<br>
<br>
Presently we could nearly obtain this by building an
index as (from the start of John 1.1):<br>
<br>
G1722@PREP G746@N-DSF G2258@V-IXI-3S<br>
<br>
But this would require users to know strongs numbers
rather than lexical form, which would almost certainly
need a GUI to help them build the search syntax.<br>
<br>
Hope this helps,<br>
<br>
Troy
<div>
<div class="h5"><br>
<br>
<br>
<br>
<br>
On 07/10/2012 11:41 PM, Chris Burrell wrote:<br>
</div>
</div>
</div>
<blockquote type="cite">
<div>
<div class="h5">Hello
<div><br>
</div>
<div>Does anyone know/tried some kind of stem
search with JSword? Is it implemented? Or would
we need to do a bit more work there?</div>
<div><br>
</div>
<div>Chris</div>
<div><br>
</div>
<br>
<fieldset></fieldset>
<br>
</div>
</div>
<div class="im">
<pre>_______________________________________________
jsword-devel mailing list
<a moz-do-not-send="true" href="mailto:jsword-devel@crosswire.org" target="_blank">jsword-devel@crosswire.org</a>
<a moz-do-not-send="true" href="http://www.crosswire.org/mailman/listinfo/jsword-devel" target="_blank">http://www.crosswire.org/mailman/listinfo/jsword-devel</a>
</pre>
</div>
</blockquote>
<br>
<br>
</div>
</blockquote>
</div>
<br>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
jsword-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:jsword-devel@crosswire.org">jsword-devel@crosswire.org</a>
<a class="moz-txt-link-freetext" href="http://www.crosswire.org/mailman/listinfo/jsword-devel">http://www.crosswire.org/mailman/listinfo/jsword-devel</a>
</pre>
</blockquote>
<br>
<br>
</body>
</html>