<html><head><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><div><div>On Jan 4, 2013, at 4:34 PM, Chris Burrell <<a href="mailto:chris@burrell.me.uk">chris@burrell.me.uk</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div dir="ltr"><div style="">There are two separate issues here.</div><div style=""><br></div><div style="">1- The fact that we retrieve the closest match to a strong number is IMHO rather obscure and confusing in itself. I've hit this several times and found through rather laborious investigation that a module was using a bad strong number, or some piece of code hadn't quite formatted the number right, etc.</div></div></blockquote><div><br></div>This is a feature of a dictionary lookup. This will typically find the longest common prefix.</div><div><br></div><div>It'd probably be good to mark some dictionaries as exact match only. Strong's, Robinson's, and maybe daily devotions seem like candidates.</div><div><br><blockquote type="cite"><div dir="ltr">
<div><br></div>2- H00: The KJV is the most obvious example of a module that has/had it. It looks like someone has removed them all in the KJV2006 project (<a href="http://www.crosswire.org/~dmsmith/kjv2006/index.html">http://www.crosswire.org/~dmsmith/kjv2006/index.html</a>). Version 2.3 of the module still has it. Did we replace this with something else? H00 was used to indicate that the first occurrence of the strong number was the same original word as the second one. We were going to put them into the ESV. <div>
<br></div><div>So for example Gen 2.9, used to read something like this:</div><div><br><div><div><div><title type="x-gen">Genesis 2:9</title></div><div><verse osisID="Gen.2.9"></div>
<div><span class="" style="white-space:pre">        </span><w lemma="strong:H04480">And out</w> </div><div><span class="" style="white-space:pre">        </span><w lemma="strong:H0127">of the ground</w> </div>
<div><span class="" style="white-space:pre">        </span><b><w lemma="strong:H00 strong:H06779">made</w> </b></div><div><span class="" style="white-space:pre">        </span><w lemma="strong:H03068">the <seg><divineName>Lord</divineName></seg></w> </div>
<div><span class="" style="white-space:pre">        </span><b><w lemma="strong:H0430">God</w> </b></div><div><span class="" style="white-space:pre">        </span><w lemma="strong:H06779" morph="strongMorph:TH8686">to grow</w> </div>
<div style=""> [ ... ... ... some more stuff goes here ... ... ...]</div><div></verse></div><br></div></div></div><div><br></div><div style="">In the above, this indicates that the translators split the word H06779 into "made" and into "to grow". </div>
<div style=""><br></div><div style="">It seems someone has removed all of these marks. However we don't have the "src" tag either so can anyone suggest how I can tell which bits go together and which bits go apart? What was the reasoning behind this change?</div></div></blockquote><br>I maintain the KJV. I couldn't find a purpose of H00. So I took it out as being wrong. If it is the splitting of words, we have a mechanism for that in the NT, which could be used. It uses src="XX" (which for the NT ties back to the XX word in the verse in a particular Greek module), the type="x-split" and subType="x-NN" where NN is a unique number w/in the verse having a value greater than the greatest value of src="XX". I'm not at all sure that subType is still needed. Both src and type are each sufficient to solve the problem.</div><div><br></div><div>A bit more exploring to do on the KJV...<br><div><br><div></div></div><blockquote type="cite"><div dir="ltr">
<div style=""><br></div><div style="">Chris</div><div style=""><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On 4 January 2013 21:07, DM Smith <span dir="ltr"><<a href="mailto:dmsmith@crosswire.org" target="_blank">dmsmith@crosswire.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">H00 is not a valid Strong's number. The modules that have it should be re-done. Do you know which are the problem modules?<br>
<br>
The problem with allowing H00 is that it will not find an entry in a Strong's dictionary and will get the nearest one. Which is better? An error filling the console or confusing the user?<br>
<br>
I don't mind changing the regex to be simpler, but it should not create further problems.<br>
<br>
The part at the end is an optional extension. We have a module in the wings that has it.<br>
<br>
In Him,<br>
DM<br>
<div><div class="h5"><br>
On Jan 4, 2013, at 3:34 PM, Chris Burrell <<a href="mailto:chris@burrell.me.uk">chris@burrell.me.uk</a>> wrote:<br>
<br>
> Hi<br>
><br>
> Can I suggest a fix to the StrongNumberFilter, which currently relies on<br>
> org.crosswire.jsword.book.study.StrongsNumber<br>
><br>
> The regular expression used to match the Strong number is:<br>
> private static final Pattern STRONGS_PATTERN = Pattern.compile("([GgHh])0*([1-9][0-9]*)!?([A-Za-z]+)?");<br>
><br>
> Unfortunately, some texts use H00 as a strong number to indicate that the tagged word is in 2 places (i.e. this is only the first part of the tag).<br>
><br>
> The above expression causes huge amounts of Logging to be output to the console.<br>
><br>
> I suggest we change it to something like<br>
><br>
> [GgHh][0-9]+<br>
><br>
> Also, what's the stuff at the end of the regex? Haven't come across any like that...<br>
><br>
> Chris<br>
><br>
</div></div>> _______________________________________________<br>
> jsword-devel mailing list<br>
> <a href="mailto:jsword-devel@crosswire.org">jsword-devel@crosswire.org</a><br>
> <a href="http://www.crosswire.org/mailman/listinfo/jsword-devel" target="_blank">http://www.crosswire.org/mailman/listinfo/jsword-devel</a><br>
<br>
</blockquote></div><br></div>
</blockquote></div><br></div></body></html>