[jsword-devel] False search hits with certain locales

DM Smith dmsmith at crosswire.org
Thu Feb 9 12:59:02 MST 2012


Yes, in both getPassage and getVerse it is better to use getOsisID 
rather than getName.
You can also use getOsisRef, which will result in a more compact 
representation than getOsisID for ranges. So I actually recommend 
getOsisRef over getOsisID (but test it first).

And it would be good to make the change also in the exception block, 
too. That way errors are reported against what was parsed.

-- DM

On 02/09/2012 02:24 PM, Martin Denham wrote:
> Thanks for that DM.  I was hoping to get by without your code change 
> because an And Bible release is coming up but I had to use it to fix 
> another localisation search problem in Thai.  When Thai is selected 
> searching for 'John' in ESV only had 83 hits instead of 132.  I peered 
> at the Thai localisation but I could see no obvious issues -  it is 
> quite an amazing language.
>
> This seems quite an important fix.  I notice that the getVerse method 
> in KeyUtil also uses getName instead of getOsisID.  Should getVerse be 
> changed too.
> I also see that the AV11N version in svn uses getName instead of 
> getOsisID - will that have the same problem?
>
> ...A bit later - after more testing...
>
> While testing the Thai localisation I noticed that I could not display 
> 1 Thess in BWE, EMTV, Murdock but other modules like KJV, ESV, 
> GodsWord worked.  There was an error message:
>         Key can't be a verse: 1?????????? 1
> which seems to come from the getVerse method I mentioned at the top of 
> this e-mail.  So I changed getName to getOsisID in the getVerse method 
> too and that seems to fix that.  Could you confirm that it is correct 
> to use getOsisID in both methods?
>
> Thanks
> Martin
>
>
> On 8 February 2012 20:44, DM Smith <dmsmith at crosswire.org 
> <mailto:dmsmith at crosswire.org>> wrote:
>
>     The simplest fix (though not right in an av11n context) is to change
>     KeyUtil.getPassage(Key)
>     from
>             try {
>                 ref = keyf.getKey(key.getName());
>             } catch (NoSuchKeyException ex) {
>                 log.warn("Key can't be a passage: " + key.getName());
>                 ref = keyf.createEmptyKeyList();
>             }
>     to
>             try {
>                 ref = keyf.getKey(key.getOsisID());
>             } catch (NoSuchKeyException ex) {
>                 log.warn("Key can't be a passage: " + key.getName());
>                 ref = keyf.createEmptyKeyList();
>
>             }
>
>     On 02/08/2012 03:14 PM, DM Smith wrote:
>>     On 02/08/2012 03:05 PM, Martin Denham wrote:
>>>     I found the problem:
>>>     Rev.Full = Johannes\u2019 openberring
>>>
>>>     \u0219 is an apostrophe and so it was matching Johannes'
>>>     openberring 22:8 but stopping at the apostrophe which of course
>>>     matched the whole of John.
>>>
>>>     Best regards
>>>     Martin
>>
>>     That's part of it. I just looked at it and the bigger, other part
>>     is that KeyUtil.getPassage(Key) tries to cast a Key to a passage.
>>     It should not be called on a Verse or a VerseRange. As it gets
>>     the locale version of the Verse and tries to convert that to a
>>     Passage.
>>
>>     It didn't need to convert the verse to Norwegian, and then
>>     re-parse it, incorrectly into a passage. It had everything it
>>     needed in the Verse.
>>
>>     I've got to think about that for a bit to figure out the best way
>>     to fix it and where. Fixing it will be a performance improvement
>>     in general.
>>
>>     The apostrophe in the name will cause other problems in JSword.
>>     Likewise for other "punctuation". But that is another problem.
>>
>>     In Him,
>>         DM
>>
>>>
>>>
>>>     On 8 February 2012 19:23, Martin Denham <mjdenham at gmail.com
>>>     <mailto:mjdenham at gmail.com>> wrote:
>>>
>>>         I have just noticed that I have not fixed the problem.  I am
>>>         now getting an error on the final hit 'Key can't be a
>>>         passage' - I don't know what that means:
>>>         02-08 19:05:34.105: I/System.out(22191): 129 found:Johannes'
>>>         openberring 1:1 docid=30681 docbase=0 key.card:1 res.card=129
>>>         02-08 19:05:34.105: I/System.out(22191): 130 found:Johannes'
>>>         openberring 1:4 docid=30684 docbase=0 key.card:1 res.card=130
>>>         02-08 19:05:34.105: I/System.out(22191): 131 found:Johannes'
>>>         openberring 1:9 docid=30689 docbase=0 key.card:1 res.card=131
>>>         02-08 19:05:34.145: I/System.out(22191): JSword:Key can't be
>>>         a passage: Johannes' openberring 22:8
>>>         02-08 19:05:34.155: I/System.out(22191): 132 found:Johannes'
>>>         openberring 22:8 docid=31071 docbase=0 key.card:1 res.card=131
>>>
>>>         To log the cardinality I just added a println in the
>>>         VerseCollector as below:
>>>                     Key key =
>>>         VerseFactory.fromString(doc.get(LuceneIndex.FIELD_KEY));
>>>                     results.addAll(key);
>>>                     System.out.println(++count + " found:"
>>>         +key.getName()+ " docid="+docId+" docbase="+docBase+"
>>>         key.card:"+key.getCardinality()+"
>>>         res.card="+results.getCardinality());
>>>
>>>         The problem is I can't see the bug on Windows, only when
>>>         running on my Android phone, so I am not sure anybody
>>>         without an Android will be able to reproduce the problem easily.
>>>
>>>         Martin
>>>
>>>         On 8 February 2012 19:04, DM Smith <dmsmith at crosswire.org
>>>         <mailto:dmsmith at crosswire.org>> wrote:
>>>
>>>             I've been trying to get to it, but haven't be able to do
>>>             so. I'd be interested in your code to log the cardinality.
>>>             -- DM
>>>
>>>
>>>             On 02/08/2012 01:54 PM, Martin Denham wrote:
>>>>             I don't know what is going on but I have done more
>>>>             analysis and found a fix for Nynorsk, but I think it is
>>>>             affecting other locales like Japanese which I can't
>>>>             explain.
>>>>
>>>>             Test: search for 'John' in NT in And Bible with locale
>>>>             set to nn
>>>>             Result: 1389 hits including every verse in the gospel
>>>>             of John
>>>>             Observation: I logged the cardinality of the results
>>>>             var in VerseCollector and you can see that it jumps
>>>>             from 131 to 1389 on the last hit in Rev.22.8:
>>>>             02-08 18:18:15.895: I/System.out(21945): 127
>>>>             found:Apostelgjerningane 19:4 docid=27575 docbase=0
>>>>             key.card:1 res.card=127
>>>>             02-08 18:18:15.905: I/System.out(21945): 128
>>>>             found:Galatarane 2:9 docid=29073 docbase=0 key.card:1
>>>>             res.card=128
>>>>             02-08 18:18:15.905: I/System.out(21945): 129
>>>>             found:Johannes' openberring 1:1 docid=30681 docbase=0
>>>>             key.card:1 res.card=129
>>>>             02-08 18:18:15.915: I/System.out(21945): 130
>>>>             found:Johannes' openberring 1:4 docid=30684 docbase=0
>>>>             key.card:1 res.card=130
>>>>             02-08 18:18:15.915: I/System.out(21945): 131
>>>>             found:Johannes' openberring 1:9 docid=30689 docbase=0
>>>>             key.card:1 res.card=131
>>>>             02-08 18:18:15.965: I/System.out(21945): 132
>>>>             found:Johannes' openberring 22:8 docid=31071 docbase=0
>>>>             key.card:1 res.card=1389
>>>>
>>>>             Other words in Rev 22 seem to have the same effect e.g.
>>>>             month, behold, am,...
>>>>
>>>>             The fix for nn was to change
>>>>                Rev.Short=Op
>>>>             to
>>>>                Rev.Short=JoOp
>>>>
>>>>             Any idea what is happening?  I tried to write a junit
>>>>             on my pc but couldn't get it to fail on Windows.
>>>>
>>>>             I am using revision 2195 of JSword, which is before the
>>>>             AV changes.
>>>>
>>>>             Thanks
>>>>             Martin
>>>>
>>>>
>>>>             On 2 February 2012 11:20, DM Smith
>>>>             <dmsmith at crosswire.org <mailto:dmsmith at crosswire.org>>
>>>>             wrote:
>>>>
>>>>                 I'm trying to see what is happening. It doesn't
>>>>                 make sense to me either.
>>>>
>>>>                 Cent from my fone so theer mite be tipos. ;)
>>>>
>>>>                 On Jan 27, 2012, at 9:44 AM, Martin Denham
>>>>                 <mjdenham at gmail.com <mailto:mjdenham at gmail.com>> wrote:
>>>>
>>>>>                 Hi,
>>>>>
>>>>>                 I have received this error report for And Bible
>>>>>                 <http://code.google.com/p/and-bible/issues/detail?id=87> which
>>>>>                 has confused me.  I would be grateful for any
>>>>>                 suggestions wrt what might be happening.
>>>>>
>>>>>                 A simple test I have tried:
>>>>>
>>>>>                   * Set locale to de or en
>>>>>                   * Search for 'John' in ESV
>>>>>                   * Works fine
>>>>>                   * Set locale to nn (Norsk Nynorsk)
>>>>>                   * Search for 'John' in ESV
>>>>>                   * Every verse of John is returned in the result list
>>>>>
>>>>>                 Thanks
>>>>>                 Martin
>>>>
>>
>>
>>
>>     _______________________________________________
>>     jsword-devel mailing list
>>     jsword-devel at crosswire.org  <mailto:jsword-devel at crosswire.org>
>>     http://www.crosswire.org/mailman/listinfo/jsword-devel
>
>
>     _______________________________________________
>     jsword-devel mailing list
>     jsword-devel at crosswire.org <mailto:jsword-devel at crosswire.org>
>     http://www.crosswire.org/mailman/listinfo/jsword-devel
>
>
>
>
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20120209/362678de/attachment-0001.html>


More information about the jsword-devel mailing list