[tyndale-devel] Strongs Tagging for ESV

David Instone-Brewer davidinstonebrewer at gmail.com
Sun Mar 20 03:02:54 MST 2011

Troy, is there more than one KJV tagging system?
BIbleWorks says their Strongs tagging comes from Online.

David IB

At 00:15 20/03/2011, David Instone-Brewer wrote:
>Whoops - thanks for this! I had the facts back to front.
>Well, this is good news in that the KJV tagging is better than the 
>NASB, but bad news in that the NASB is closer to the ESV than the KJV is.
>Thanks for warning us before we did too much work.
>David IB
>At 23:34 19/03/2011, Troy A. Griffitts wrote:
>>Nice job guys.  Just a point of clarification:
>>On 03/19/2011 01:04 PM, David Instone-Brewer wrote:
>> > 4) merge the resultant text with the verb parsing in the tagged KJV
>>I'm confused a bit about where the NASB and KJV come into play with 
>>your tagging efforts.
>> > Since starting this, I've heard from Troy who originally 
>> organised the team who tagged the NASB.
>> > He says his method is:
>>We did not tag the NASB.  We tagged the KJV.  I would not use the 
>>NASB markup if I was doing this project, to avoid any copyright 
>>infringement of Lockman's data.
>>On 03/19/2011 09:54 PM, David Instone-Brewer wrote:
>>>Dear Rob
>>>I've been doing some experiments with Gen.1 to work out a system.
>>>I've found a method which works really well - the whole tagging of 
>>>Gen.1 has been done correctly by automatic comparisons and it has 
>>>only gone wrong in a few verses.
>>>I've tried using Stanfords parsing engine at 
>>>but this didn't fix it. I've attached a file listing my 
>>>experiments and their results.
>>>I think what would fix it is a semantic domain dictionary. What's 
>>>happened is that the two versions are too different in v. 11:
>>>ESV: And God said, "Let the earth sprout vegetation, plants 
>>>yielding seed, and fruit trees bearing fruit in which is their 
>>>seed, each according to its kind, on the earth." And it was so.
>>>NASB: Then God said, "Let the earth sprout vegetation: plants 
>>>yielding seed, and fruit trees on the earth bearing fruit after 
>>>their kind with seed in them"; and it was so.
>>>The change in order in the words in bold makes it too difficult 
>>>for the comparison program to match things up.
>>>I think we will need humans at these points, but I think we can 
>>>highlight the likely places where problems exist.
>>>Tomorrow I'll have a go at producing the whole text of Genesis, so 
>>>you have some data to play with
>>>David IB
>>>THe process we are attempting is:
>>>1) convert the NASB XML text to something which looks like a 
>>>BibleWorks exported text
>>>   (ie each verse on one line starting with a simple ref (eg Gen 
>>> 1:1 In the beginning...)
>>>2) use the Word 2003+ text comparison tools (which are much 
>>>superior to Word 97) to compare the text of both versions 
>>>producing something like:
>>>Gen 1:2  <w H776>The earth</w> was formless <w H8414>formless</w> 
>>><w H922>and void</w>, and <w H2822>darkness </w> <w H5921>was 
>>>over</w> the <w H6440>surface </w> <w H8415>of the deep</w>, and . 
>>>And <w H7307>the Spirit </w> <w H430>of God</w> was <w 
>>>H7363!b>moving hovering </w> <w H5921>over</w> the <w 
>>>H6440>surface </w> <w H4325>of the waters.  </w>.
>>>3) create a site where human can easily correct this automatic markup
>>>  - eg the proof of concept 
>>> <http://www.slowley.com/tagger-proof-of-concept/example.html>here.
>>>4) merge the resultant text with the verb parsing in the tagged KJV
>>>Since starting this, I've heard from Troy who originally organised 
>>>the team who tagged the NASB. He says his method is:
>>>1) starts with a lemma tagged text, the KJV, and CrossWay's ESV 
>>>data in OSIS format.
>>>2) the ESV module is iterated each verse at a time and is 
>>>processed as such:
>>>3) the OSIS markup is stripped from the ESV text and positioning 
>>>information is retained
>>>4) a word table is built from the KJV text:
>>>        KJV Word 1    |    Strongs #
>>>        KJV Word 2    |    Strongs #
>>>5) a second table is build from the ESV text:
>>>        ESV Word 1    |
>>>        ESV Word 2    |
>>>6) these tables are passed to a function which is responsible 
>>>solely for the logic to fill in the second part of the second 
>>>table with Strong's numbers.
>>>7) the returned table is used to reconstitute the the OSIS tags to 
>>>the ESV text including word-level Strong's markup.
>>>See a screenshot for the community collaboration tool for KJV 
>>>Strongs markup project is at 
>>>We're hoping to convert it to a web application instead of a 
>>>standalone Java GUI, but that hasn't happened yet.
>>>I'd love to work together on this effort.  Please keep me posted 
>>>on any progress and let me know if I can help in anyway.
>>>At 10:18 17/03/2011, Robert Slowley wrote:
>>>>So, presumably if you could script it to break each chapter in to a
>>>>separate file, do the comparisons, and then re-export as a single file
>>>>we could import that in to a tool like mine so a human could fix the
>>>>errors and do the bits the auto-comparison failed to do.
>>>>On Tue, Mar 15, 2011 at 8:19 AM, David Instone-Brewer
>>>><mailto:davidinstonebrewer at gmail.com><davidinstonebrewer at gmail.com> wrote:
>>>> > From the automatic comparisons produced by Word, we get:
>>>> >
>>>> > Gen 1:1  <w H7225>In the beginning,</w> <w H430>God</w> <w
>>>> > H1254!a>created</w> <w H8064>the heavens</w> <w H776>and the earth </w>.
>>>> > Gen 1:2  <w H776>The earth</w> was <w H8414>without form</w> <w H922>and
>>>> > void</w>, and <w H2822>darkness</w> <w H5921>was over</w> the <w
>>>> > H6440>face</w> <w H8415>of the deep</w>. And <w H7307>the Spirit</w> <w
>>>> > H430>of God</w> was <w H7363!b>hovering </w> <w H5921>over</w> the <w
>>>> > H6440>face</w> <w H4325>of the waters  </w>.
>>>> >
>>>> > - ie the first two verses are already perfectly tagged. In 
>>>> fact there aren't
>>>> > any problems in Gen.1 till we get to v.5:
>>>> >
>>>> > Gen 1:5  <w H430>God</w> <w H7121>called</w> <w H216>the light</w> <w
>>>> > H3117>Day</w>, <w H2822>and the darkness</w> <w H7121>he called</w> <w
>>>> > H3915>Night.</w>. And <w H6153>there was evening</w> <w 
>>>> H1242>and there was
>>>> > morningthe first</w>, <w H259>one</w> <w H3117>day</w>.
>>>> >
>>>> > The problem is that Word gives up making these comparisons after a few
>>>> > chapters.
>>>> > Some of these problems can be cleared up by macros.
>>>> >
>>>> > David IB
>>>> >
>>>> > At 00:43 15/03/2011, Robert Slowley wrote:
>>>> >
>>>> >> I think I can produce a better text to produce something 
>>>> which has less to
>>>> >> correct.
>>>> > What do you mean?
>>>> >
>>>> >> It would be useful to have transliterated Hebrew and a 
>>>> single-word meaning
>>>> >> instead of the numbers.
>>>> > I have an electronic copy of the stuff you get on popups on
>>>> > 
>>>> http://classic.net.bible.org/verse.php?search=Genesis%201:30&book=genesis&chapter=1&verse=30 
>>>> > for Strongs already - which I was planning to integrate. If the
>>>> > numbers are replaced with 'transliterated Hebrew' or a 'single-word
>>>> > meaning' what specifically would that mean?
>>>> >
>>>> > For instance on
>>>> > 
>>>> http://classic.net.bible.org/verse.php?search=Genesis%201:30&book=genesis&chapter=1&verse=30 
>>>> > for the strongs reference h03651, which is the transliterated hebrew,
>>>> > and which is the single word meaning?
>>>> >
>>>> >> It would be useful to divide the top line by the tagging, not by any
>>>> >> English
>>>> >> parsing
>>>> >>  eg Gen.1.30  || and to every thing (h3605 )||
>>>> >>   instead of     || and to every (h3605) ||  thing (h3605 ) ||
>>>> > In the case of Genesis 1:30 the text behind it is:
>>>> > NASB: ... <w H3605>and to every</w> <w H3605>thing</w> ...
>>>> >
>>>> > Presumably there is a reason for the text to have two separate sets of
>>>> > words both tagged individually with H3605? Or is it just a markup
>>>> > error?
>>>> >
>>>> > Presumably in some cases it words should be merged if they have the
>>>> > same strongs and are next to each other, but in other cases, this
>>>> > isn't the case, e.g. Isa 6:3
>>>> > 
>>>> http://classic.net.bible.org/verse.php?search=isa%206:3&book=isa&chapter=6&verse=3 
>>>> >
>>>> > Has:
>>>> >
>>>> > <w H6918>Holy</w>, <w H6918>Holy</w>, <w H6918>Holy</w>, is the <w
>>>> > H3068>Lord</w> <w H6635>of hosts</w>
>>>> >
>>>> > because the Hebrew has swdq repeated 3 times, and I assume that the
>>>> > reader who understands Strong's gets this indication by it being
>>>> > repeated rather than there being <w H6918>Holy, Holy, Holy</w>. Is
>>>> > that right?
>>>> >
>>>> >> It might be better to have the bottom line with a separate box for very
>>>> >> word. Sometimes we will want to divide things up differently
>>>> > As I see it we have 'phrases' (a set of one or more words) which may
>>>> > have one or more strongs references. In some cases a set of words with
>>>> > have a shared strongs reference, but in other cases like Isa 6:3 sets
>>>> > of contiguous words may have the same strongs references but still be
>>>> > separate 'phrases'. As I see it there's no automatically working this
>>>> > out.
>>>> >
>>>> > What I was thinking was to have some algorithm that tries to
>>>> > automatically map the NASB strongs annotations on to the ESV text,
>>>> > similar to what I have already crudely done here. That can either try
>>>> > to group things as the NASB does (where a set of contiguous words
>>>> > share a strongs reference), or do what I have done here (which is
>>>> > easier) which is to automatically group words in to a 'phrase' where
>>>> > they share the same strongs references.
>>>> >
>>>> > Either way not all of the ESV can be automatically annotated in this
>>>> > way, the annotation will be wrong in some cases, and the automated
>>>> > grouping may be wrong in some cases. So I was thinking of making the
>>>> > interface such that once the automated grouping has been attempted the
>>>> > end user can click on a box which will make it selected, then click on
>>>> > the next box to the left or right (and so on), when this is done a
>>>> > button for "merging in to a phrase" would appear - then if this is
>>>> > clicked they would be made in to a phrase and could have their strongs
>>>> > references assigned. Alternatively clicking on a box that represents a
>>>> > phrase of one or more words will cause a "demerge" button to appear
>>>> > that will separate out all the words. This will allow the end user to
>>>> > handle both types of situation.
>>>> >
>>>> > I also thought some sort of "This verse is tagged correctly" button
>>>> > would be good. In some cases the program will annotate everything, but
>>>> > it will still need to be checked by a human - and a human may wish
>>>> > their annotation to be checked by someone else for quality purposes.
>>>> > When a verse is marked as correct, it can have a tick or something,
>>>> > and there can be a page of "verses that need work" which it would
>>>> > automatically be removed from. Does that sound sensible?
>>>> >
>>>> > We have easy access to the SBLGNT (with apparatus) and Leningrad
>>>> > Codex. Is it worthwhile including those for each verse? I don't know
>>>> > what process an annotator would go through, and what level of
>>>> > knowledge of the original languages they would use.
>>>> >
>>>> > I worked a bit today on tidying up the classes I've written, and
>>>> > improving the processing of the text (in the next few weeks I'll send
>>>> > you a list of the suspicious stuff I found while processing your files
>>>> > ;-) ). I'm away next week for my 1st year's anniversary holiday - but
>>>> > after that can start work on making this in to an actual web app that
>>>> > would be useful rather than a static web page demo of the sort of
>>>> > thing I had in mind.
>>>> >
>>>> > Any thoughts / comments / ideas appreciated!
>>>> >
>>>> > It'd probably be a good idea to see if we can improve the automatic
>>>> > annotation of the ESV from the NASB if we can, as any progress made
>>>> > here before people start manually annotating / checking will reduce
>>>> > the amount of man hours needed to complete the task.
>>>> >
>>>> > -Rob
>>>> > --
>>>> > http://www.slowley.com/
>>>> >
>>>> > "On two occasions, I have been asked [by members of Parliament],
>>>> > 'Pray, Mr. Babbage, if you put into the machine wrong figures, will
>>>> > the right answers come out?' I am not able to rightly apprehend the
>>>> > kind of confusion of ideas that could provoke such a question."
>>>> > -- Charles Babbage (1791-1871)
>>>>"On two occasions, I have been asked [by members of Parliament],
>>>>'Pray, Mr. Babbage, if you put into the machine wrong figures, will
>>>>the right answers come out?' I am not able to rightly apprehend the
>>>>kind of confusion of ideas that could provoke such a question."
>>>>-- Charles Babbage (1791-1871)
>>>tyndale-devel mailing list
>>><mailto:tyndale-devel at crosswire.org>tyndale-devel at crosswire.org
>>tyndale-devel mailing list
>>tyndale-devel at crosswire.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/tyndale-devel/attachments/20110320/bd2892e2/attachment-0001.html>

More information about the tyndale-devel mailing list