[sword-devel] Searching for hyphenated words?
DM Smith
dmsmith at crosswire.org
Sun Mar 3 06:46:36 MST 2013
You're not missing anything. These kind of problems are handled best by normalization.
In my earlier post I was suggesting that we normalize an hyphenated word, say "God-ward", to its parts and the whole: "God", "ward" and "Godward".
Solving backward compatibliity is fairly simple. Have a version number for the built index. If it doesn't match the expected value from the normalizer, the index is invalid and can't be used. JSword has the code for such a mechanism, but it hasn't been woven in. One could go deeper than a single coarse grain version number and have version numbers for each feature that is part of an index.
In Him,
DM
On Mar 3, 2013, at 8:36 AM, Chris Burrell <chris at burrell.me.uk> wrote:
> I still think normalisation of what is searched for would be good, in that it basically means the user sees the results that he is looking for.
>
> I understand the concern for backwards compatibility and perhaps that means frontends should be able to turn this normalisation off. But looking ahead, for new front-ends, front-ends that can make rebuilding indexes part of the upgrade to a new version and for all new downloads of frontends, this has to be a benefit.
>
> Not normalising, seems to me like perpetuating an existing problem into all new downloads from this day forth. Or am I missing something?
> Chris
>
>
>
> On 3 March 2013 12:53, Jonathan Morgan <jonmmorgan at gmail.com> wrote:
> Another possibly related normalisation problem which BPBible at least has an open issue about is Caesar vs. Cæsar. Theoretically I guess you want either search to match both forms. I don't know how Lucene etc. deals with this (if at all).
>
> Jon
>
>
> On Mon, Feb 25, 2013 at 2:48 AM, David Haslam <dfhmch at googlemail.com> wrote:
> In the KJV module, if you want to search for [say] the hyphenated name
> "Maher–shalal–hash–baz", you first have to be aware that this module uses
> the ndash in place of the hyphen.
>
> btw. It's not so easy to enter the ndash from a keyboard, and probably even
> harder in an Android tablet or mobile.
>
> If you use ordinary hyphen/minus for the search key hyphen for this module,
> you don't find anything with "Exact phrase".
> If you use "Multi-word", you do find "Maher" highlighted in the found verse.
> (e.g. using Xiphos).
>
> For modules in general, however, the user cannot usually know in advance
> whether hyphenated words use the ndash, the hyphen or something else.
>
> Has anyone else looked into this aspect of the search feature?
>
> David
>
>
>
>
>
> --
> View this message in context: http://sword-dev.350566.n4.nabble.com/Searching-for-hyphenated-words-tp4652016.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20130303/d1e9c416/attachment.html>
More information about the sword-devel
mailing list