[jsword-devel] Term counts is double what it should be

DM Smith dmsmith at crosswire.org
Thu Feb 7 11:25:25 MST 2013


Just put up a pull request.

The code's bug was in !s.equals(termText). Also found in appropriate handling of errors.

In Him,
	DM

On Feb 7, 2013, at 12:33 PM, Chris Burrell <chris at burrell.me.uk> wrote:

> I see you've raised a bug in JIRA... Any pointers on the bug? http://www.crosswire.org/tracker/browse/JS-243
> 
> I can tell from Luke, that this is the content:
> 
> G3588 G1063 G3745 G4270 G1519 G2251 G1319 G4270 G2443 G2192 G1223 G3588 G5281 G2532 G3588 G3874 G3588 G1124 G2192 G3588 G1680
> 
> which bears very little resemblance to what OsisUtil#getStrongsNumbers returns:
> 
> G5599 G453 G1052 G5101 G940 G5209 G3982 G3361 G3982 G3588 G225 G2596 G3739 G3788 G2424 G5547 G4270 G4717 G1722 G5213
> 
> Chris
> 
> 
> 
> On 7 February 2013 14:52, DM Smith <dmsmith at crosswire.org> wrote:
> There's a bug in StrongsNumberFilter. Looking at it now.
> -- DM
> 
> On Feb 7, 2013, at 9:03 AM, Chris Burrell <chris at burrell.me.uk> wrote:
> 
>> Yes - it is double.
>> 
>> Gal.3.1 gives me an explanation of. When I try with my own lucene indexes, they look alright. However, the pattern is not consistent. The counts are often correct. Haven't got an idea of proportion yet. (see explanation in screenshot below)
>> 
>> <image.png>
>> 
>> 
>> On 7 February 2013 14:01, DM Smith <dmsmith at crosswire.org> wrote:
>> Does luke give you access to the counts? Is it double too? -- DM
>> 
>> On Feb 7, 2013, at 8:22 AM, Chris Burrell <chris at burrell.me.uk> wrote:
>> 
>>> No doubt that would cause issue too, but my case here is actually for most words, even those not split.
>>> 
>>> I think a term vector allows you to store the position/offsets of the terms in each document, so that you can accurately work out where it was in the original sentence/verse even though you may not have the original stored any longer. 
>>> 
>>> For the purpose of counts I don't think it's necessary, although I haven't tried without yet.
>>> Chris
>>> 
>>> 
>>> 
>>> On 7 February 2013 13:12, DM Smith <dmsmith at crosswire.org> wrote:
>>> Not sure if this is the problem:
>>> In the KJV, there are a lot of splits of Greek as it translates into English.
>>> 
>>> For example, in Rev 22.5 look at φωτιζει αυτους which translates directly into English as "gives light to them", but is translated in the KJV as "giveth them light", so "them" splits "giveth light":
>>> <verse osisID="Rev.22.5" sID="Rev.22.5"/>
>>> <w src="1" lemma="strong:G2532 tr:και" morph="robinson:CONJ">And</w>
>>> <w src="4" lemma="strong:G2071 tr:εσται" morph="robinson:V-FXI-3S">there shall be</w>
>>> <w src="3" lemma="strong:G3756 tr:ουκ" morph="robinson:PRT-N">no</w>
>>> <w src="2" lemma="strong:G3571 tr:νυξ" morph="robinson:N-NSF">night</w>
>>> <w src="5" lemma="strong:G1563 tr:εκει" morph="robinson:ADV">there</w>;
>>> <w src="6" lemma="strong:G2532 tr:και" morph="robinson:CONJ">and</w>
>>> <w src="9" lemma="strong:G2192 tr:εχουσιν" morph="robinson:V-PAI-3P">they</w>
>>> <w src="7" lemma="strong:G5532 tr:χρειαν" morph="robinson:N-ASF">need</w>
>>> <w src="8" lemma="strong:G3756 tr:ουκ" morph="robinson:PRT-N">no</w>
>>> <w src="10" lemma="strong:G3088 tr:λυχνου" morph="robinson:N-GSM">candle</w>,
>>> <w src="11" lemma="strong:G2532 tr:και" morph="robinson:CONJ">neither</w>
>>> <w src="12" lemma="strong:G5457 tr:φωτος" morph="robinson:N-GSN">light</w>
>>> <w src="13" lemma="strong:G2246 tr:ηλιου" morph="robinson:N-GSM">of the sun</w>;
>>> <w src="14" lemma="strong:G3754 tr:οτι" morph="robinson:CONJ">for</w>
>>> <w src="15" lemma="strong:G2962 tr:κυριος" morph="robinson:N-NSM">the Lord</w>
>>> <w src="16 17" lemma="strong:G3588 strong:G2316 tr:ο tr:θεος" morph="robinson:T-NSM robinson:N-NSM">God</w>
>>> <w src="18" lemma="strong:G5461 tr:φωτιζει" morph="robinson:V-PAI-3S" type="x-split-3868">giveth</w>
>>> <w src="19" lemma="strong:G846 tr:αυτους" morph="robinson:P-APM">them</w>
>>> <w src="18" lemma="strong:G5461 tr:φωτιζει" morph="robinson:V-PAI-3S" type="x-split-3868">light</w>:
>>> <w src="20" lemma="strong:G2532 tr:και" morph="robinson:CONJ">and</w>
>>> <w src="21" lemma="strong:G936 tr:βασιλευσουσιν" morph="robinson:V-FAI-3P">they shall reign</w>
>>> <w src="22" lemma="strong:G1519 tr:εις" morph="robinson:PREP">for</w>
>>> <w src="23 24" lemma="strong:G3588 strong:G165 tr:τους tr:αιωνας" morph="robinson:T-APM robinson:N-APM">ever</w>
>>> <w src="25 26" lemma="strong:G3588 strong:G165 tr:των tr:αιωνων" morph="robinson:T-GPM robinson:N-GPM">and ever</w>.
>>> <milestone type="x-strongsMarkup" resp="pdy 2003-12-31-00:30"/>
>>> <verse eID="Rev.22.5"/>
>>> 
>>> BTW, I'm not sure what a TermVector is nor how it would be used.
>>> 
>>> In Him,
>>>         DM
>>> 
>>> On Feb 7, 2013, at 6:36 AM, Chris Burrell <chris at burrell.me.uk> wrote:
>>> 
>>> > Hi
>>> >
>>> > Using Luke, and my own code to look at the indexes created by JSword shows that the term count is double what it should be...
>>> >
>>> > Any ideas why that might be? I can't quite follow the logic in StrongAnalyser but I attempted to work step/debug through it and it didn't look like it was double counting. Might need to do that again.
>>> >
>>> > DM, haven't checked, but apparently the TermVector may not be what I'm using..
>>> >
>>> > Chris
>>> >
>>> > _______________________________________________
>>> > jsword-devel mailing list
>>> > jsword-devel at crosswire.org
>>> > http://www.crosswire.org/mailman/listinfo/jsword-devel
>>> 
>>> 
>> 
>> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20130207/14aab602/attachment.html>


More information about the jsword-devel mailing list