[jsword-devel] Term counts is double what it should be

Chris Burrell chris at burrell.me.uk
Thu Feb 7 06:22:46 MST 2013


No doubt that would cause issue too, but my case here is actually for most
words, even those not split.

I think a term vector allows you to store the position/offsets of the terms
in each document, so that you can accurately work out where it was in the
original sentence/verse even though you may not have the original stored
any longer.

For the purpose of counts I don't think it's necessary, although I haven't
tried without yet.
Chris



On 7 February 2013 13:12, DM Smith <dmsmith at crosswire.org> wrote:

> Not sure if this is the problem:
> In the KJV, there are a lot of splits of Greek as it translates into
> English.
>
> For example, in Rev 22.5 look at φωτιζει αυτους which translates directly
> into English as "gives light to them", but is translated in the KJV as
> "giveth them light", so "them" splits "giveth light":
> <verse osisID="Rev.22.5" sID="Rev.22.5"/>
> <w src="1" lemma="strong:G2532 tr:και" morph="robinson:CONJ">And</w>
> <w src="4" lemma="strong:G2071 tr:εσται" morph="robinson:V-FXI-3S">there
> shall be</w>
> <w src="3" lemma="strong:G3756 tr:ουκ" morph="robinson:PRT-N">no</w>
> <w src="2" lemma="strong:G3571 tr:νυξ" morph="robinson:N-NSF">night</w>
> <w src="5" lemma="strong:G1563 tr:εκει" morph="robinson:ADV">there</w>;
> <w src="6" lemma="strong:G2532 tr:και" morph="robinson:CONJ">and</w>
> <w src="9" lemma="strong:G2192 tr:εχουσιν"
> morph="robinson:V-PAI-3P">they</w>
> <w src="7" lemma="strong:G5532 tr:χρειαν" morph="robinson:N-ASF">need</w>
> <w src="8" lemma="strong:G3756 tr:ουκ" morph="robinson:PRT-N">no</w>
> <w src="10" lemma="strong:G3088 tr:λυχνου"
> morph="robinson:N-GSM">candle</w>,
> <w src="11" lemma="strong:G2532 tr:και" morph="robinson:CONJ">neither</w>
> <w src="12" lemma="strong:G5457 tr:φωτος" morph="robinson:N-GSN">light</w>
> <w src="13" lemma="strong:G2246 tr:ηλιου" morph="robinson:N-GSM">of the
> sun</w>;
> <w src="14" lemma="strong:G3754 tr:οτι" morph="robinson:CONJ">for</w>
> <w src="15" lemma="strong:G2962 tr:κυριος" morph="robinson:N-NSM">the
> Lord</w>
> <w src="16 17" lemma="strong:G3588 strong:G2316 tr:ο tr:θεος"
> morph="robinson:T-NSM robinson:N-NSM">God</w>
> <w src="18" lemma="strong:G5461 tr:φωτιζει" morph="robinson:V-PAI-3S"
> type="x-split-3868">giveth</w>
> <w src="19" lemma="strong:G846 tr:αυτους" morph="robinson:P-APM">them</w>
> <w src="18" lemma="strong:G5461 tr:φωτιζει" morph="robinson:V-PAI-3S"
> type="x-split-3868">light</w>:
> <w src="20" lemma="strong:G2532 tr:και" morph="robinson:CONJ">and</w>
> <w src="21" lemma="strong:G936 tr:βασιλευσουσιν"
> morph="robinson:V-FAI-3P">they shall reign</w>
> <w src="22" lemma="strong:G1519 tr:εις" morph="robinson:PREP">for</w>
> <w src="23 24" lemma="strong:G3588 strong:G165 tr:τους tr:αιωνας"
> morph="robinson:T-APM robinson:N-APM">ever</w>
> <w src="25 26" lemma="strong:G3588 strong:G165 tr:των tr:αιωνων"
> morph="robinson:T-GPM robinson:N-GPM">and ever</w>.
> <milestone type="x-strongsMarkup" resp="pdy 2003-12-31-00:30"/>
> <verse eID="Rev.22.5"/>
>
> BTW, I'm not sure what a TermVector is nor how it would be used.
>
> In Him,
>         DM
>
> On Feb 7, 2013, at 6:36 AM, Chris Burrell <chris at burrell.me.uk> wrote:
>
> > Hi
> >
> > Using Luke, and my own code to look at the indexes created by JSword
> shows that the term count is double what it should be...
> >
> > Any ideas why that might be? I can't quite follow the logic in
> StrongAnalyser but I attempted to work step/debug through it and it didn't
> look like it was double counting. Might need to do that again.
> >
> > DM, haven't checked, but apparently the TermVector may not be what I'm
> using..
> >
> > Chris
> >
> > _______________________________________________
> > jsword-devel mailing list
> > jsword-devel at crosswire.org
> > http://www.crosswire.org/mailman/listinfo/jsword-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20130207/75efb161/attachment.html>


More information about the jsword-devel mailing list