[jsword-devel] False search hits with certain locales

DM Smith dmsmith at crosswire.org
Wed Feb 8 13:44:44 MST 2012


The simplest fix (though not right in an av11n context) is to change
KeyUtil.getPassage(Key)
from
         try {
             ref = keyf.getKey(key.getName());
         } catch (NoSuchKeyException ex) {
             log.warn("Key can't be a passage: " + key.getName());
             ref = keyf.createEmptyKeyList();
         }
to
         try {
             ref = keyf.getKey(key.getOsisID());
         } catch (NoSuchKeyException ex) {
             log.warn("Key can't be a passage: " + key.getName());
             ref = keyf.createEmptyKeyList();
         }

On 02/08/2012 03:14 PM, DM Smith wrote:
> On 02/08/2012 03:05 PM, Martin Denham wrote:
>> I found the problem:
>> Rev.Full = Johannes\u2019 openberring
>>
>> \u0219 is an apostrophe and so it was matching Johannes' openberring 
>> 22:8 but stopping at the apostrophe which of course matched the whole 
>> of John.
>>
>> Best regards
>> Martin
>
> That's part of it. I just looked at it and the bigger, other part is 
> that KeyUtil.getPassage(Key) tries to cast a Key to a passage. It 
> should not be called on a Verse or a VerseRange. As it gets the locale 
> version of the Verse and tries to convert that to a Passage.
>
> It didn't need to convert the verse to Norwegian, and then re-parse 
> it, incorrectly into a passage. It had everything it needed in the Verse.
>
> I've got to think about that for a bit to figure out the best way to 
> fix it and where. Fixing it will be a performance improvement in general.
>
> The apostrophe in the name will cause other problems in JSword. 
> Likewise for other "punctuation". But that is another problem.
>
> In Him,
>     DM
>
>>
>>
>> On 8 February 2012 19:23, Martin Denham <mjdenham at gmail.com 
>> <mailto:mjdenham at gmail.com>> wrote:
>>
>>     I have just noticed that I have not fixed the problem.  I am now
>>     getting an error on the final hit 'Key can't be a passage' - I
>>     don't know what that means:
>>     02-08 19:05:34.105: I/System.out(22191): 129 found:Johannes'
>>     openberring 1:1 docid=30681 docbase=0 key.card:1 res.card=129
>>     02-08 19:05:34.105: I/System.out(22191): 130 found:Johannes'
>>     openberring 1:4 docid=30684 docbase=0 key.card:1 res.card=130
>>     02-08 19:05:34.105: I/System.out(22191): 131 found:Johannes'
>>     openberring 1:9 docid=30689 docbase=0 key.card:1 res.card=131
>>     02-08 19:05:34.145: I/System.out(22191): JSword:Key can't be a
>>     passage: Johannes' openberring 22:8
>>     02-08 19:05:34.155: I/System.out(22191): 132 found:Johannes'
>>     openberring 22:8 docid=31071 docbase=0 key.card:1 res.card=131
>>
>>     To log the cardinality I just added a println in the
>>     VerseCollector as below:
>>                 Key key =
>>     VerseFactory.fromString(doc.get(LuceneIndex.FIELD_KEY));
>>                 results.addAll(key);
>>                 System.out.println(++count + " found:"
>>     +key.getName()+ " docid="+docId+" docbase="+docBase+"
>>     key.card:"+key.getCardinality()+"
>>     res.card="+results.getCardinality());
>>
>>     The problem is I can't see the bug on Windows, only when running
>>     on my Android phone, so I am not sure anybody without an Android
>>     will be able to reproduce the problem easily.
>>
>>     Martin
>>
>>     On 8 February 2012 19:04, DM Smith <dmsmith at crosswire.org
>>     <mailto:dmsmith at crosswire.org>> wrote:
>>
>>         I've been trying to get to it, but haven't be able to do so.
>>         I'd be interested in your code to log the cardinality.
>>         -- DM
>>
>>
>>         On 02/08/2012 01:54 PM, Martin Denham wrote:
>>>         I don't know what is going on but I have done more analysis
>>>         and found a fix for Nynorsk, but I think it is affecting
>>>         other locales like Japanese which I can't explain.
>>>
>>>         Test: search for 'John' in NT in And Bible with locale set to nn
>>>         Result: 1389 hits including every verse in the gospel of John
>>>         Observation: I logged the cardinality of the results var in
>>>         VerseCollector and you can see that it jumps from 131 to
>>>         1389 on the last hit in Rev.22.8:
>>>         02-08 18:18:15.895: I/System.out(21945): 127
>>>         found:Apostelgjerningane 19:4 docid=27575 docbase=0
>>>         key.card:1 res.card=127
>>>         02-08 18:18:15.905: I/System.out(21945): 128
>>>         found:Galatarane 2:9 docid=29073 docbase=0 key.card:1
>>>         res.card=128
>>>         02-08 18:18:15.905: I/System.out(21945): 129 found:Johannes'
>>>         openberring 1:1 docid=30681 docbase=0 key.card:1 res.card=129
>>>         02-08 18:18:15.915: I/System.out(21945): 130 found:Johannes'
>>>         openberring 1:4 docid=30684 docbase=0 key.card:1 res.card=130
>>>         02-08 18:18:15.915: I/System.out(21945): 131 found:Johannes'
>>>         openberring 1:9 docid=30689 docbase=0 key.card:1 res.card=131
>>>         02-08 18:18:15.965: I/System.out(21945): 132 found:Johannes'
>>>         openberring 22:8 docid=31071 docbase=0 key.card:1 res.card=1389
>>>
>>>         Other words in Rev 22 seem to have the same effect e.g.
>>>         month, behold, am,...
>>>
>>>         The fix for nn was to change
>>>            Rev.Short=Op
>>>         to
>>>            Rev.Short=JoOp
>>>
>>>         Any idea what is happening?  I tried to write a junit on my
>>>         pc but couldn't get it to fail on Windows.
>>>
>>>         I am using revision 2195 of JSword, which is before the AV
>>>         changes.
>>>
>>>         Thanks
>>>         Martin
>>>
>>>
>>>         On 2 February 2012 11:20, DM Smith <dmsmith at crosswire.org
>>>         <mailto:dmsmith at crosswire.org>> wrote:
>>>
>>>             I'm trying to see what is happening. It doesn't make
>>>             sense to me either.
>>>
>>>             Cent from my fone so theer mite be tipos. ;)
>>>
>>>             On Jan 27, 2012, at 9:44 AM, Martin Denham
>>>             <mjdenham at gmail.com <mailto:mjdenham at gmail.com>> wrote:
>>>
>>>>             Hi,
>>>>
>>>>             I have received this error report for And Bible
>>>>             <http://code.google.com/p/and-bible/issues/detail?id=87> which
>>>>             has confused me.  I would be grateful for any
>>>>             suggestions wrt what might be happening.
>>>>
>>>>             A simple test I have tried:
>>>>
>>>>               * Set locale to de or en
>>>>               * Search for 'John' in ESV
>>>>               * Works fine
>>>>               * Set locale to nn (Norsk Nynorsk)
>>>>               * Search for 'John' in ESV
>>>>               * Every verse of John is returned in the result list
>>>>
>>>>             Thanks
>>>>             Martin
>>>
>
>
>
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20120208/d4b19335/attachment.html>


More information about the jsword-devel mailing list