[jsword-devel] False search hits with certain locales

Martin Denham mjdenham at gmail.com
Thu Feb 9 12:24:08 MST 2012


Thanks for that DM.  I was hoping to get by without your code change
because an And Bible release is coming up but I had to use it to fix
another localisation search problem in Thai.  When Thai is selected
searching for 'John' in ESV only had 83 hits instead of 132.  I peered at
the Thai localisation but I could see no obvious issues -  it is quite an
amazing language.

This seems quite an important fix.  I notice that the getVerse method in
KeyUtil also uses getName instead of getOsisID.  Should getVerse be changed
too.
I also see that the AV11N version in svn uses getName instead of getOsisID
- will that have the same problem?

...A bit later - after more testing...

While testing the Thai localisation I noticed that I could not display 1
Thess in BWE, EMTV, Murdock but other modules like KJV, ESV, GodsWord
worked.  There was an error message:
        Key can't be a verse: 1เธสะโลนิกา 1
which seems to come from the getVerse method I mentioned at the top of this
e-mail.  So I changed getName to getOsisID in the getVerse method too and
that seems to fix that.  Could you confirm that it is correct to use
getOsisID in both methods?

Thanks
Martin


On 8 February 2012 20:44, DM Smith <dmsmith at crosswire.org> wrote:

>  The simplest fix (though not right in an av11n context) is to change
> KeyUtil.getPassage(Key)
> from
>         try {
>             ref = keyf.getKey(key.getName());
>         } catch (NoSuchKeyException ex) {
>             log.warn("Key can't be a passage: " + key.getName());
>             ref = keyf.createEmptyKeyList();
>         }
> to
>         try {
>             ref = keyf.getKey(key.getOsisID());
>         } catch (NoSuchKeyException ex) {
>             log.warn("Key can't be a passage: " + key.getName());
>             ref = keyf.createEmptyKeyList();
>
>         }
>
> On 02/08/2012 03:14 PM, DM Smith wrote:
>
> On 02/08/2012 03:05 PM, Martin Denham wrote:
>
> I found the problem:
> Rev.Full = Johannes\u2019 openberring
>
>  \u0219 is an apostrophe and so it was matching Johannes' openberring
> 22:8 but stopping at the apostrophe which of course matched the whole of
> John.
>
>  Best regards
> Martin
>
>
> That's part of it. I just looked at it and the bigger, other part is that
> KeyUtil.getPassage(Key) tries to cast a Key to a passage. It should not be
> called on a Verse or a VerseRange. As it gets the locale version of the
> Verse and tries to convert that to a Passage.
>
> It didn't need to convert the verse to Norwegian, and then re-parse it,
> incorrectly into a passage. It had everything it needed in the Verse.
>
> I've got to think about that for a bit to figure out the best way to fix
> it and where. Fixing it will be a performance improvement in general.
>
> The apostrophe in the name will cause other problems in JSword. Likewise
> for other "punctuation". But that is another problem.
>
> In Him,
>     DM
>
>
>
> On 8 February 2012 19:23, Martin Denham <mjdenham at gmail.com> wrote:
>
>> I have just noticed that I have not fixed the problem.  I am now getting
>> an error on the final hit 'Key can't be a passage' - I don't know what that
>> means:
>>  02-08 19:05:34.105: I/System.out(22191): 129 found:Johannes'
>> openberring 1:1 docid=30681 docbase=0 key.card:1 res.card=129
>> 02-08 19:05:34.105: I/System.out(22191): 130 found:Johannes' openberring
>> 1:4 docid=30684 docbase=0 key.card:1 res.card=130
>> 02-08 19:05:34.105: I/System.out(22191): 131 found:Johannes' openberring
>> 1:9 docid=30689 docbase=0 key.card:1 res.card=131
>> 02-08 19:05:34.145: I/System.out(22191): JSword:Key can't be a passage:
>> Johannes' openberring 22:8
>> 02-08 19:05:34.155: I/System.out(22191): 132 found:Johannes' openberring
>> 22:8 docid=31071 docbase=0 key.card:1 res.card=131
>>
>>  To log the cardinality I just added a println in the VerseCollector as
>> below:
>>              Key key =
>> VerseFactory.fromString(doc.get(LuceneIndex.FIELD_KEY));
>>             results.addAll(key);
>>             System.out.println(++count + " found:" +key.getName()+ "
>> docid="+docId+" docbase="+docBase+" key.card:"+key.getCardinality()+"
>> res.card="+results.getCardinality());
>>
>>  The problem is I can't see the bug on Windows, only when running on my
>> Android phone, so I am not sure anybody without an Android will be able to
>> reproduce the problem easily.
>>
>>  Martin
>>
>> On 8 February 2012 19:04, DM Smith <dmsmith at crosswire.org> wrote:
>>
>>>  I've been trying to get to it, but haven't be able to do so. I'd be
>>> interested in your code to log the cardinality.
>>> -- DM
>>>
>>>
>>> On 02/08/2012 01:54 PM, Martin Denham wrote:
>>>
>>> I don't know what is going on but I have done more analysis and found a
>>> fix for Nynorsk, but I think it is affecting other locales like Japanese
>>> which I can't explain.
>>>
>>>  Test: search for 'John' in NT in And Bible with locale set to nn
>>> Result: 1389 hits including every verse in the gospel of John
>>> Observation: I logged the cardinality of the results var in
>>> VerseCollector and you can see that it jumps from 131 to 1389 on the last
>>> hit in Rev.22.8:
>>> 02-08 18:18:15.895: I/System.out(21945): 127 found:Apostelgjerningane
>>> 19:4 docid=27575 docbase=0 key.card:1 res.card=127
>>>  02-08 18:18:15.905: I/System.out(21945): 128 found:Galatarane 2:9
>>> docid=29073 docbase=0 key.card:1 res.card=128
>>> 02-08 18:18:15.905: I/System.out(21945): 129 found:Johannes' openberring
>>> 1:1 docid=30681 docbase=0 key.card:1 res.card=129
>>> 02-08 18:18:15.915: I/System.out(21945): 130 found:Johannes' openberring
>>> 1:4 docid=30684 docbase=0 key.card:1 res.card=130
>>> 02-08 18:18:15.915: I/System.out(21945): 131 found:Johannes' openberring
>>> 1:9 docid=30689 docbase=0 key.card:1 res.card=131
>>> 02-08 18:18:15.965: I/System.out(21945): 132 found:Johannes' openberring
>>> 22:8 docid=31071 docbase=0 key.card:1 res.card=1389
>>>
>>>  Other words in Rev 22 seem to have the same effect e.g. month, behold,
>>> am,...
>>>
>>>  The fix for nn was to change
>>>     Rev.Short=Op
>>>  to
>>>    Rev.Short=JoOp
>>>
>>>  Any idea what is happening?  I tried to write a junit on my pc but
>>> couldn't get it to fail on Windows.
>>>
>>>  I am using revision 2195 of JSword, which is before the AV changes.
>>>
>>>  Thanks
>>> Martin
>>>
>>>
>>> On 2 February 2012 11:20, DM Smith <dmsmith at crosswire.org> wrote:
>>>
>>>>  I'm trying to see what is happening. It doesn't make sense to me
>>>> either.
>>>>
>>>> Cent from my fone so theer mite be tipos. ;)
>>>>
>>>> On Jan 27, 2012, at 9:44 AM, Martin Denham <mjdenham at gmail.com> wrote:
>>>>
>>>>  Hi,
>>>>
>>>>  I have received this error report for And Bible<http://code.google.com/p/and-bible/issues/detail?id=87> which
>>>> has confused me.  I would be grateful for any suggestions wrt what might be
>>>> happening.
>>>>
>>>>  A simple test I have tried:
>>>>
>>>>    - Set locale to de or en
>>>>    - Search for 'John' in ESV
>>>>    - Works fine
>>>>    - Set locale to nn (Norsk Nynorsk)
>>>>    - Search for 'John' in ESV
>>>>    - Every verse of John is returned in the result list
>>>>
>>>> Thanks
>>>>  Martin
>>>>
>>>>
>
>
> _______________________________________________
> jsword-devel mailing listjsword-devel at crosswire.orghttp://www.crosswire.org/mailman/listinfo/jsword-devel
>
>
>
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20120209/d6b5fd7d/attachment.html>


More information about the jsword-devel mailing list