[jsword-devel] Term counts is double what it should be

DM Smith dmsmith at crosswire.org
Thu Feb 7 07:52:23 MST 2013


There's a bug in StrongsNumberFilter. Looking at it now.
-- DM

On Feb 7, 2013, at 9:03 AM, Chris Burrell <chris at burrell.me.uk> wrote:

> Yes - it is double.
> 
> Gal.3.1 gives me an explanation of. When I try with my own lucene indexes, they look alright. However, the pattern is not consistent. The counts are often correct. Haven't got an idea of proportion yet. (see explanation in screenshot below)
> 
> <image.png>
> 
> 
> On 7 February 2013 14:01, DM Smith <dmsmith at crosswire.org> wrote:
> Does luke give you access to the counts? Is it double too? -- DM
> 
> On Feb 7, 2013, at 8:22 AM, Chris Burrell <chris at burrell.me.uk> wrote:
> 
>> No doubt that would cause issue too, but my case here is actually for most words, even those not split.
>> 
>> I think a term vector allows you to store the position/offsets of the terms in each document, so that you can accurately work out where it was in the original sentence/verse even though you may not have the original stored any longer. 
>> 
>> For the purpose of counts I don't think it's necessary, although I haven't tried without yet.
>> Chris
>> 
>> 
>> 
>> On 7 February 2013 13:12, DM Smith <dmsmith at crosswire.org> wrote:
>> Not sure if this is the problem:
>> In the KJV, there are a lot of splits of Greek as it translates into English.
>> 
>> For example, in Rev 22.5 look at φωτιζει αυτους which translates directly into English as "gives light to them", but is translated in the KJV as "giveth them light", so "them" splits "giveth light":
>> <verse osisID="Rev.22.5" sID="Rev.22.5"/>
>> <w src="1" lemma="strong:G2532 tr:και" morph="robinson:CONJ">And</w>
>> <w src="4" lemma="strong:G2071 tr:εσται" morph="robinson:V-FXI-3S">there shall be</w>
>> <w src="3" lemma="strong:G3756 tr:ουκ" morph="robinson:PRT-N">no</w>
>> <w src="2" lemma="strong:G3571 tr:νυξ" morph="robinson:N-NSF">night</w>
>> <w src="5" lemma="strong:G1563 tr:εκει" morph="robinson:ADV">there</w>;
>> <w src="6" lemma="strong:G2532 tr:και" morph="robinson:CONJ">and</w>
>> <w src="9" lemma="strong:G2192 tr:εχουσιν" morph="robinson:V-PAI-3P">they</w>
>> <w src="7" lemma="strong:G5532 tr:χρειαν" morph="robinson:N-ASF">need</w>
>> <w src="8" lemma="strong:G3756 tr:ουκ" morph="robinson:PRT-N">no</w>
>> <w src="10" lemma="strong:G3088 tr:λυχνου" morph="robinson:N-GSM">candle</w>,
>> <w src="11" lemma="strong:G2532 tr:και" morph="robinson:CONJ">neither</w>
>> <w src="12" lemma="strong:G5457 tr:φωτος" morph="robinson:N-GSN">light</w>
>> <w src="13" lemma="strong:G2246 tr:ηλιου" morph="robinson:N-GSM">of the sun</w>;
>> <w src="14" lemma="strong:G3754 tr:οτι" morph="robinson:CONJ">for</w>
>> <w src="15" lemma="strong:G2962 tr:κυριος" morph="robinson:N-NSM">the Lord</w>
>> <w src="16 17" lemma="strong:G3588 strong:G2316 tr:ο tr:θεος" morph="robinson:T-NSM robinson:N-NSM">God</w>
>> <w src="18" lemma="strong:G5461 tr:φωτιζει" morph="robinson:V-PAI-3S" type="x-split-3868">giveth</w>
>> <w src="19" lemma="strong:G846 tr:αυτους" morph="robinson:P-APM">them</w>
>> <w src="18" lemma="strong:G5461 tr:φωτιζει" morph="robinson:V-PAI-3S" type="x-split-3868">light</w>:
>> <w src="20" lemma="strong:G2532 tr:και" morph="robinson:CONJ">and</w>
>> <w src="21" lemma="strong:G936 tr:βασιλευσουσιν" morph="robinson:V-FAI-3P">they shall reign</w>
>> <w src="22" lemma="strong:G1519 tr:εις" morph="robinson:PREP">for</w>
>> <w src="23 24" lemma="strong:G3588 strong:G165 tr:τους tr:αιωνας" morph="robinson:T-APM robinson:N-APM">ever</w>
>> <w src="25 26" lemma="strong:G3588 strong:G165 tr:των tr:αιωνων" morph="robinson:T-GPM robinson:N-GPM">and ever</w>.
>> <milestone type="x-strongsMarkup" resp="pdy 2003-12-31-00:30"/>
>> <verse eID="Rev.22.5"/>
>> 
>> BTW, I'm not sure what a TermVector is nor how it would be used.
>> 
>> In Him,
>>         DM
>> 
>> On Feb 7, 2013, at 6:36 AM, Chris Burrell <chris at burrell.me.uk> wrote:
>> 
>> > Hi
>> >
>> > Using Luke, and my own code to look at the indexes created by JSword shows that the term count is double what it should be...
>> >
>> > Any ideas why that might be? I can't quite follow the logic in StrongAnalyser but I attempted to work step/debug through it and it didn't look like it was double counting. Might need to do that again.
>> >
>> > DM, haven't checked, but apparently the TermVector may not be what I'm using..
>> >
>> > Chris
>> >
>> > _______________________________________________
>> > jsword-devel mailing list
>> > jsword-devel at crosswire.org
>> > http://www.crosswire.org/mailman/listinfo/jsword-devel
>> 
>> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20130207/a69944a8/attachment.html>


More information about the jsword-devel mailing list