[jsword-devel] WLC MORPH
Martin Gruner
mg.pub at gmx.net
Fri Jul 21 07:41:05 MST 2006
Hi Kirk,
[cross-posting: this is from a thread about exact morphological searches with
Kirk Lowery]
> >>> We'd also have to add another (redundant) element <seg ...
> >>> x-wlc-morphstring="lemma1 at morphcode1"> to be able to perform searches
> >>> using the "lemma at morphcode" syntax.
> >>
> >> I'm having trouble visualizing why this is so, but I trust you that it
> >> *is* so! :-)
> >
> > The reason is as simple as stupid. Searching works verse-based, not
> > word-based in Sword/BibleTime atm. So when I search for "lemma:somelemma
> > and morph:somemorph" to find "somelemma at somemorph", I might get verses
> > that contain one word with lemma=somelemma and another one with
> > morph=somemorph. That's why, at least for now, we do need the lemma at morph
> > syntax somewhere. Unless a better option comes up, of course.
>
> Ah! Now I understand. Okay, this brings up one of my favorite gripes
> about Bible software in general, including Sword: granularity of
> searching. All Bible software has its primary segmentation to be the
> "verse." Yet most users, even the most unsophisticated, look to the
> sentence or more likely the word level. Also, with lemma and parsing
> strings, there are likely to be many "false positives" because there are
> many morphemes with the same or similar strings in the same verse.
>
> If Sword would reset the granularity of text segmentation to the word,
> it would trump every other Bible software package out there!
>
> Okay, I'll stop. :-)
>
> > The search engine adds a field for everything, referring to a particular
> > verse. So the verse's text is one field, all of the individual morph,
> > lemma, footnote etc. strings are added as other distinct fields. So when
> > I search for "lemma:somelemma and morph:somemorph", the index is searched
> > for verses that have one matching lemma field and one matching morph
> > field. That's why the error mentioned above would disturb the user.
> >
> > Need to think more about it, perhaps there can be a way around this.
>
> How about overlapping indexes? One at the verse-level, one for each
> word- and one for each morpheme? It might take some reworking of the
> search engine, but maybe not so much? Then the user could choose when
> index he wants to search?
Thant's exactly what I thought.
Let's first do the module right, and then think about integrating the
word-level index into BibleTime and, if possible, Sword later. That is why
I'm CC'ing to Troy and other lists as well.
God bless,
Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.crosswire.org/pipermail/jsword-devel/attachments/20060721/7bf4ad38/attachment.bin
More information about the jsword-devel
mailing list