[jsword-devel] Term counts is double what it should be

Chris Burrell chris at burrell.me.uk
Thu Feb 7 10:33:43 MST 2013


I see you've raised a bug in JIRA... Any pointers on the bug?
http://www.crosswire.org/tracker/browse/JS-243

I can tell from Luke, that this is the content:

G3588 G1063 G3745 G4270 G1519 G2251 G1319 G4270 G2443 G2192 G1223 G3588
G5281 G2532 G3588 G3874 G3588 G1124 G2192 G3588 G1680

which bears very little resemblance to what OsisUtil#getStrongsNumbers
returns:

G5599 G453 G1052 G5101 G940 G5209 G3982 G3361 G3982 G3588 G225 G2596 G3739
G3788 G2424 G5547 G4270 G4717 G1722 G5213

Chris



On 7 February 2013 14:52, DM Smith <dmsmith at crosswire.org> wrote:

> There's a bug in StrongsNumberFilter. Looking at it now.
> -- DM
>
> On Feb 7, 2013, at 9:03 AM, Chris Burrell <chris at burrell.me.uk> wrote:
>
> Yes - it is double.
>
> Gal.3.1 gives me an explanation of. When I try with my own lucene indexes,
> they look alright. However, the pattern is not consistent. The counts are
> often correct. Haven't got an idea of proportion yet. (see explanation in
> screenshot below)
>
> <image.png>
>
>
> On 7 February 2013 14:01, DM Smith <dmsmith at crosswire.org> wrote:
>
>> Does luke give you access to the counts? Is it double too? -- DM
>>
>> On Feb 7, 2013, at 8:22 AM, Chris Burrell <chris at burrell.me.uk> wrote:
>>
>> No doubt that would cause issue too, but my case here is actually for
>> most words, even those not split.
>>
>> I think a term vector allows you to store the position/offsets of the
>> terms in each document, so that you can accurately work out where it was in
>> the original sentence/verse even though you may not have the original
>> stored any longer.
>>
>> For the purpose of counts I don't think it's necessary, although I
>> haven't tried without yet.
>> Chris
>>
>>
>>
>> On 7 February 2013 13:12, DM Smith <dmsmith at crosswire.org> wrote:
>>
>>> Not sure if this is the problem:
>>> In the KJV, there are a lot of splits of Greek as it translates into
>>> English.
>>>
>>> For example, in Rev 22.5 look at φωτιζει αυτους which translates
>>> directly into English as "gives light to them", but is translated in the
>>> KJV as "giveth them light", so "them" splits "giveth light":
>>> <verse osisID="Rev.22.5" sID="Rev.22.5"/>
>>> <w src="1" lemma="strong:G2532 tr:και" morph="robinson:CONJ">And</w>
>>> <w src="4" lemma="strong:G2071 tr:εσται" morph="robinson:V-FXI-3S">there
>>> shall be</w>
>>> <w src="3" lemma="strong:G3756 tr:ουκ" morph="robinson:PRT-N">no</w>
>>> <w src="2" lemma="strong:G3571 tr:νυξ" morph="robinson:N-NSF">night</w>
>>> <w src="5" lemma="strong:G1563 tr:εκει" morph="robinson:ADV">there</w>;
>>> <w src="6" lemma="strong:G2532 tr:και" morph="robinson:CONJ">and</w>
>>> <w src="9" lemma="strong:G2192 tr:εχουσιν"
>>> morph="robinson:V-PAI-3P">they</w>
>>> <w src="7" lemma="strong:G5532 tr:χρειαν" morph="robinson:N-ASF">need</w>
>>> <w src="8" lemma="strong:G3756 tr:ουκ" morph="robinson:PRT-N">no</w>
>>> <w src="10" lemma="strong:G3088 tr:λυχνου"
>>> morph="robinson:N-GSM">candle</w>,
>>> <w src="11" lemma="strong:G2532 tr:και" morph="robinson:CONJ">neither</w>
>>> <w src="12" lemma="strong:G5457 tr:φωτος"
>>> morph="robinson:N-GSN">light</w>
>>> <w src="13" lemma="strong:G2246 tr:ηλιου" morph="robinson:N-GSM">of the
>>> sun</w>;
>>> <w src="14" lemma="strong:G3754 tr:οτι" morph="robinson:CONJ">for</w>
>>> <w src="15" lemma="strong:G2962 tr:κυριος" morph="robinson:N-NSM">the
>>> Lord</w>
>>> <w src="16 17" lemma="strong:G3588 strong:G2316 tr:ο tr:θεος"
>>> morph="robinson:T-NSM robinson:N-NSM">God</w>
>>> <w src="18" lemma="strong:G5461 tr:φωτιζει" morph="robinson:V-PAI-3S"
>>> type="x-split-3868">giveth</w>
>>> <w src="19" lemma="strong:G846 tr:αυτους" morph="robinson:P-APM">them</w>
>>> <w src="18" lemma="strong:G5461 tr:φωτιζει" morph="robinson:V-PAI-3S"
>>> type="x-split-3868">light</w>:
>>> <w src="20" lemma="strong:G2532 tr:και" morph="robinson:CONJ">and</w>
>>> <w src="21" lemma="strong:G936 tr:βασιλευσουσιν"
>>> morph="robinson:V-FAI-3P">they shall reign</w>
>>> <w src="22" lemma="strong:G1519 tr:εις" morph="robinson:PREP">for</w>
>>> <w src="23 24" lemma="strong:G3588 strong:G165 tr:τους tr:αιωνας"
>>> morph="robinson:T-APM robinson:N-APM">ever</w>
>>> <w src="25 26" lemma="strong:G3588 strong:G165 tr:των tr:αιωνων"
>>> morph="robinson:T-GPM robinson:N-GPM">and ever</w>.
>>> <milestone type="x-strongsMarkup" resp="pdy 2003-12-31-00:30"/>
>>> <verse eID="Rev.22.5"/>
>>>
>>> BTW, I'm not sure what a TermVector is nor how it would be used.
>>>
>>> In Him,
>>>         DM
>>>
>>> On Feb 7, 2013, at 6:36 AM, Chris Burrell <chris at burrell.me.uk> wrote:
>>>
>>> > Hi
>>> >
>>> > Using Luke, and my own code to look at the indexes created by JSword
>>> shows that the term count is double what it should be...
>>> >
>>> > Any ideas why that might be? I can't quite follow the logic in
>>> StrongAnalyser but I attempted to work step/debug through it and it didn't
>>> look like it was double counting. Might need to do that again.
>>> >
>>> > DM, haven't checked, but apparently the TermVector may not be what I'm
>>> using..
>>> >
>>> > Chris
>>> >
>>> > _______________________________________________
>>> > jsword-devel mailing list
>>> > jsword-devel at crosswire.org
>>> > http://www.crosswire.org/mailman/listinfo/jsword-devel
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20130207/2c432d13/attachment-0001.html>


More information about the jsword-devel mailing list