[jsword-devel] StrongNumber in indexing

Chris Burrell chris at burrell.me.uk
Fri Jan 4 14:34:17 MST 2013


There are two separate issues here.

1- The fact that we retrieve the closest match to a strong number is IMHO
rather obscure and confusing in itself. I've hit this several times and
found through rather laborious investigation that a module was using a bad
strong number, or some piece of code hadn't quite formatted the number
right, etc.

2- H00: The KJV is the most obvious example of a module that has/had it. It
looks like someone has removed them all in the KJV2006 project (
http://www.crosswire.org/~dmsmith/kjv2006/index.html). Version 2.3 of the
module still has it. Did we replace this with something else? H00 was used
to indicate that the first occurrence of the strong number was the same
original word as the second one. We were going to put them into the ESV.

So for example Gen 2.9, used to read something like this:

<div><title type="x-gen">Genesis 2:9</title>
<verse osisID="Gen.2.9">
<w lemma="strong:H04480">And out</w>
<w lemma="strong:H0127">of the ground</w>
*<w lemma="strong:H00 strong:H06779">made</w> *
<w lemma="strong:H03068">the <seg><divineName>Lord</divineName></seg></w>
*<w lemma="strong:H0430">God</w> *
<w lemma="strong:H06779" morph="strongMorph:TH8686">to grow</w>
         [ ... ... ... some more stuff goes here ... ... ...]
</verse></div>

In the above, this indicates that the translators split the word H06779
into "made" and into "to grow".

It seems someone has removed all of these marks. However we don't have the
"src" tag either so can anyone suggest how I can tell which bits go
together and which bits go apart? What was the reasoning behind this change?

Chris



On 4 January 2013 21:07, DM Smith <dmsmith at crosswire.org> wrote:

> H00 is not a valid Strong's number. The modules that have it should be
> re-done. Do you know which are the problem modules?
>
> The problem with allowing H00 is that it will not find an entry in a
> Strong's dictionary and will get the nearest one. Which is better? An error
> filling the console or confusing the user?
>
> I don't mind changing the regex to be simpler, but it should not create
> further problems.
>
> The part at the end is an optional extension. We have a module in the
> wings that has it.
>
> In Him,
>         DM
>
> On Jan 4, 2013, at 3:34 PM, Chris Burrell <chris at burrell.me.uk> wrote:
>
> > Hi
> >
> > Can I suggest a fix to the StrongNumberFilter, which currently relies on
> > org.crosswire.jsword.book.study.StrongsNumber
> >
> > The regular expression used to match the Strong number is:
> > private static final Pattern STRONGS_PATTERN =
> Pattern.compile("([GgHh])0*([1-9][0-9]*)!?([A-Za-z]+)?");
> >
> > Unfortunately, some texts use H00 as a strong number to indicate that
> the tagged word is in 2 places (i.e. this is only the first part of the
> tag).
> >
> > The above expression causes huge amounts of Logging to be output to the
> console.
> >
> > I suggest we change it to something like
> >
> > [GgHh][0-9]+
> >
> > Also, what's the stuff at the end of the regex? Haven't come across any
> like that...
> >
> > Chris
> >
> > _______________________________________________
> > jsword-devel mailing list
> > jsword-devel at crosswire.org
> > http://www.crosswire.org/mailman/listinfo/jsword-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20130104/c0181e56/attachment.html>


More information about the jsword-devel mailing list