[jsword-devel] False search hits with certain locales

Martin Denham mjdenham at gmail.com
Thu Feb 9 15:03:31 MST 2012


Thanks DM,

getOsisRef seems to work fine.

I have just made the changes locally because JSword has moved on a lot.

Martin

On 9 February 2012 19:59, DM Smith <dmsmith at crosswire.org> wrote:

>  Yes, in both getPassage and getVerse it is better to use getOsisID rather
> than getName.
> You can also use getOsisRef, which will result in a more compact
> representation than getOsisID for ranges. So I actually recommend
> getOsisRef over getOsisID (but test it first).
>
> And it would be good to make the change also in the exception block, too.
> That way errors are reported against what was parsed.
>
> -- DM
>
>
> On 02/09/2012 02:24 PM, Martin Denham wrote:
>
> Thanks for that DM.  I was hoping to get by without your code change
> because an And Bible release is coming up but I had to use it to fix
> another localisation search problem in Thai.  When Thai is selected
> searching for 'John' in ESV only had 83 hits instead of 132.  I peered at
> the Thai localisation but I could see no obvious issues -  it is quite an
> amazing language.
>
>  This seems quite an important fix.  I notice that the getVerse method in
> KeyUtil also uses getName instead of getOsisID.  Should getVerse be changed
> too.
> I also see that the AV11N version in svn uses getName instead of getOsisID
> - will that have the same problem?
>
>  ...A bit later - after more testing...
>
>  While testing the Thai localisation I noticed that I could not display 1
> Thess in BWE, EMTV, Murdock but other modules like KJV, ESV, GodsWord
> worked.  There was an error message:
>          Key can't be a verse: 1เธสะโลนิกา 1
>  which seems to come from the getVerse method I mentioned at the top of
> this e-mail.  So I changed getName to getOsisID in the getVerse method too
> and that seems to fix that.  Could you confirm that it is correct to use
> getOsisID in both methods?
>
>  Thanks
> Martin
>
>
> On 8 February 2012 20:44, DM Smith <dmsmith at crosswire.org> wrote:
>
>>  The simplest fix (though not right in an av11n context) is to change
>> KeyUtil.getPassage(Key)
>> from
>>         try {
>>             ref = keyf.getKey(key.getName());
>>         } catch (NoSuchKeyException ex) {
>>             log.warn("Key can't be a passage: " + key.getName());
>>             ref = keyf.createEmptyKeyList();
>>         }
>> to
>>         try {
>>             ref = keyf.getKey(key.getOsisID());
>>         } catch (NoSuchKeyException ex) {
>>             log.warn("Key can't be a passage: " + key.getName());
>>             ref = keyf.createEmptyKeyList();
>>
>>         }
>>
>> On 02/08/2012 03:14 PM, DM Smith wrote:
>>
>>  On 02/08/2012 03:05 PM, Martin Denham wrote:
>>
>> I found the problem:
>> Rev.Full = Johannes\u2019 openberring
>>
>>  \u0219 is an apostrophe and so it was matching Johannes' openberring
>> 22:8 but stopping at the apostrophe which of course matched the whole of
>> John.
>>
>>  Best regards
>> Martin
>>
>>
>> That's part of it. I just looked at it and the bigger, other part is that
>> KeyUtil.getPassage(Key) tries to cast a Key to a passage. It should not be
>> called on a Verse or a VerseRange. As it gets the locale version of the
>> Verse and tries to convert that to a Passage.
>>
>> It didn't need to convert the verse to Norwegian, and then re-parse it,
>> incorrectly into a passage. It had everything it needed in the Verse.
>>
>> I've got to think about that for a bit to figure out the best way to fix
>> it and where. Fixing it will be a performance improvement in general.
>>
>> The apostrophe in the name will cause other problems in JSword. Likewise
>> for other "punctuation". But that is another problem.
>>
>> In Him,
>>     DM
>>
>>
>>
>> On 8 February 2012 19:23, Martin Denham <mjdenham at gmail.com> wrote:
>>
>>> I have just noticed that I have not fixed the problem.  I am now getting
>>> an error on the final hit 'Key can't be a passage' - I don't know what that
>>> means:
>>>  02-08 19:05:34.105: I/System.out(22191): 129 found:Johannes'
>>> openberring 1:1 docid=30681 docbase=0 key.card:1 res.card=129
>>> 02-08 19:05:34.105: I/System.out(22191): 130 found:Johannes' openberring
>>> 1:4 docid=30684 docbase=0 key.card:1 res.card=130
>>> 02-08 19:05:34.105: I/System.out(22191): 131 found:Johannes' openberring
>>> 1:9 docid=30689 docbase=0 key.card:1 res.card=131
>>> 02-08 19:05:34.145: I/System.out(22191): JSword:Key can't be a passage:
>>> Johannes' openberring 22:8
>>> 02-08 19:05:34.155: I/System.out(22191): 132 found:Johannes' openberring
>>> 22:8 docid=31071 docbase=0 key.card:1 res.card=131
>>>
>>>  To log the cardinality I just added a println in the VerseCollector as
>>> below:
>>>              Key key =
>>> VerseFactory.fromString(doc.get(LuceneIndex.FIELD_KEY));
>>>             results.addAll(key);
>>>             System.out.println(++count + " found:" +key.getName()+ "
>>> docid="+docId+" docbase="+docBase+" key.card:"+key.getCardinality()+"
>>> res.card="+results.getCardinality());
>>>
>>>  The problem is I can't see the bug on Windows, only when running on my
>>> Android phone, so I am not sure anybody without an Android will be able to
>>> reproduce the problem easily.
>>>
>>>  Martin
>>>
>>> On 8 February 2012 19:04, DM Smith <dmsmith at crosswire.org> wrote:
>>>
>>>>  I've been trying to get to it, but haven't be able to do so. I'd be
>>>> interested in your code to log the cardinality.
>>>> -- DM
>>>>
>>>>
>>>> On 02/08/2012 01:54 PM, Martin Denham wrote:
>>>>
>>>> I don't know what is going on but I have done more analysis and found a
>>>> fix for Nynorsk, but I think it is affecting other locales like Japanese
>>>> which I can't explain.
>>>>
>>>>  Test: search for 'John' in NT in And Bible with locale set to nn
>>>> Result: 1389 hits including every verse in the gospel of John
>>>> Observation: I logged the cardinality of the results var in
>>>> VerseCollector and you can see that it jumps from 131 to 1389 on the last
>>>> hit in Rev.22.8:
>>>> 02-08 18:18:15.895: I/System.out(21945): 127 found:Apostelgjerningane
>>>> 19:4 docid=27575 docbase=0 key.card:1 res.card=127
>>>>  02-08 18:18:15.905: I/System.out(21945): 128 found:Galatarane 2:9
>>>> docid=29073 docbase=0 key.card:1 res.card=128
>>>> 02-08 18:18:15.905: I/System.out(21945): 129 found:Johannes'
>>>> openberring 1:1 docid=30681 docbase=0 key.card:1 res.card=129
>>>> 02-08 18:18:15.915: I/System.out(21945): 130 found:Johannes'
>>>> openberring 1:4 docid=30684 docbase=0 key.card:1 res.card=130
>>>> 02-08 18:18:15.915: I/System.out(21945): 131 found:Johannes'
>>>> openberring 1:9 docid=30689 docbase=0 key.card:1 res.card=131
>>>> 02-08 18:18:15.965: I/System.out(21945): 132 found:Johannes'
>>>> openberring 22:8 docid=31071 docbase=0 key.card:1 res.card=1389
>>>>
>>>>  Other words in Rev 22 seem to have the same effect e.g. month,
>>>> behold, am,...
>>>>
>>>>  The fix for nn was to change
>>>>     Rev.Short=Op
>>>>  to
>>>>    Rev.Short=JoOp
>>>>
>>>>  Any idea what is happening?  I tried to write a junit on my pc but
>>>> couldn't get it to fail on Windows.
>>>>
>>>>  I am using revision 2195 of JSword, which is before the AV changes.
>>>>
>>>>  Thanks
>>>> Martin
>>>>
>>>>
>>>> On 2 February 2012 11:20, DM Smith <dmsmith at crosswire.org> wrote:
>>>>
>>>>>  I'm trying to see what is happening. It doesn't make sense to me
>>>>> either.
>>>>>
>>>>> Cent from my fone so theer mite be tipos. ;)
>>>>>
>>>>> On Jan 27, 2012, at 9:44 AM, Martin Denham <mjdenham at gmail.com> wrote:
>>>>>
>>>>>  Hi,
>>>>>
>>>>>  I have received this error report for And Bible<http://code.google.com/p/and-bible/issues/detail?id=87> which
>>>>> has confused me.  I would be grateful for any suggestions wrt what might be
>>>>> happening.
>>>>>
>>>>>  A simple test I have tried:
>>>>>
>>>>>    - Set locale to de or en
>>>>>    - Search for 'John' in ESV
>>>>>    - Works fine
>>>>>    - Set locale to nn (Norsk Nynorsk)
>>>>>    - Search for 'John' in ESV
>>>>>    - Every verse of John is returned in the result list
>>>>>
>>>>> Thanks
>>>>>  Martin
>>>>>
>>>>>
>>
>>
>>   _______________________________________________
>> jsword-devel mailing listjsword-devel at crosswire.orghttp://www.crosswire.org/mailman/listinfo/jsword-devel
>>
>>
>>
>> _______________________________________________
>> jsword-devel mailing list
>> jsword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>
>>
>
>
> _______________________________________________
> jsword-devel mailing listjsword-devel at crosswire.orghttp://www.crosswire.org/mailman/listinfo/jsword-devel
>
>
>
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20120209/2312bffb/attachment-0001.html>


More information about the jsword-devel mailing list