[tyndale-devel] Strongs Tagging for ESV
art at arthurbolstad.com
Sat Mar 19 07:57:18 MST 2011
The better the automation the more likely something like this could be
done for other languages, such as Malagasy where I am this month. This
thought might make the extra work seem more worthwhile.
Looking forward to what comes of this,
On 3/19/2011 4:04 PM, David Instone-Brewer wrote:
> Dear Rob
> (I may have sent this before, but I found it in my out mail with a
> wrong address on it. Apologies if you get it twice)
> Could you have a go at displaying the transliterated Hebrew and a
> one-word translation as well as Strong's number. (I'm not sure where
> you will put it all!
> I'll send you a file with the data - you'd use the non-italic text of
> col.4 and the first word of the last col.
> This file doesn't have !a and !b so just put in the word for the
> number, but don't lose the !a & !b
> And, if we split up the second line into separate words, we might try
> to provide some way to help people pick two or three consecutive words
> in the same tag.
> And it would be great to have the option of making the Hebrew visible
> (though it should probably be invisible by default so as not to
> frighten people away).
> I'm hoping that we are creating something which can be used for other
> translations - not just the ESV.
> David IB
> Here's a summary of where we are:
> We are trying to make a version of ESV with Strongs tagging, using the
> tagged NASB text as a starting point.
> THe process we are attempting is:
> 1) convert the NASB XML text to something which looks like a
> BibleWorks exported text
> (ie each verse on one line starting with a simple ref (eg Gen 1:1 In
> the beginning...)
> 2) use the Word 2003+ text comparison tools (which are much superior
> to Word 97) to compare the text of both versions producing something
> Gen 1:2 <w H776>The earth</w> was *formless *<w
> H8414>*formless*</w> <w H922>and void</w>, and <w H2822>darkness
> </w> <w H5921>was over</w> the <w H6440>*sur*face </w> <w H8415>of
> the deep</w>*, and . And *<w H7307>the Spirit </w> <w H430>of
> God</w> was <w H7363!b>*moving hovering *</w> <w H5921>over</w>
> the <w H6440>*sur*face </w> <w H4325>of the waters. </w>.
> 3) create a site where human can easily correct this automatic markup
> - eg the proof of concept here
> 4) merge the resultant text with the verb parsing in the tagged KJV
> Since starting this, I've heard from Troy who originally organised the
> team who tagged the NASB. He says his method is:
> 1) starts with a lemma tagged text, the KJV, and CrossWay's ESV
> data in OSIS format.
> 2) the ESV module is iterated each verse at a time and is
> processed as such:
> 3) the OSIS markup is stripped from the ESV text and positioning
> information is retained
> 4) a word table is built from the KJV text:
> KJV Word 1 | Strongs #
> KJV Word 2 | Strongs #
> 5) a second table is build from the ESV text:
> ESV Word 1 |
> ESV Word 2 |
> 6) these tables are passed to a function which is responsible
> solely for the logic to fill in the second part of the second
> table with Strong's numbers.
> 7) the returned table is used to reconstitute the the OSIS tags to
> the ESV text including word-level Strong's markup.
> See a screenshot for the community collaboration tool for KJV
> Strongs markup project is at http://crosswire.org/sword/kjv2003/#ss
> We're hoping to convert it to a web application instead of a
> standalone Java GUI, but that hasn't happened yet.
> I'd love to work together on this effort. Please keep me posted
> on any progress and let me know if I can help in anyway.
> At 10:18 17/03/2011, Robert Slowley wrote:
>> So, presumably if you could script it to break each chapter in to a
>> separate file, do the comparisons, and then re-export as a single file
>> we could import that in to a tool like mine so a human could fix the
>> errors and do the bits the auto-comparison failed to do.
>> On Tue, Mar 15, 2011 at 8:19 AM, David Instone-Brewer
>> <davidinstonebrewer at gmail.com> wrote:
>> > From the automatic comparisons produced by Word, we get:
>> > Gen 1:1 <w H7225>In the beginning,</w> <w H430>God</w> <w
>> > H1254!a>created</w> <w H8064>the heavens</w> <w H776>and the earth
>> > Gen 1:2 <w H776>The earth</w> was <w H8414>without form</w> <w
>> > void</w>, and <w H2822>darkness</w> <w H5921>was over</w> the <w
>> > H6440>face</w> <w H8415>of the deep</w>. And <w H7307>the Spirit</w> <w
>> > H430>of God</w> was <w H7363!b>hovering </w> <w H5921>over</w> the <w
>> > H6440>face</w> <w H4325>of the waters </w>.
>> > - ie the first two verses are already perfectly tagged. In fact
>> there aren't
>> > any problems in Gen.1 till we get to v.5:
>> > Gen 1:5 <w H430>God</w> <w H7121>called</w> <w H216>the light</w> <w
>> > H3117>Day</w>, <w H2822>and the darkness</w> <w H7121>he called</w> <w
>> > H3915>Night.</w>. And <w H6153>there was evening</w> <w H1242>and
>> there was
>> > morningthe first</w>, <w H259>one</w> <w H3117>day</w>.
>> > The problem is that Word gives up making these comparisons after a few
>> > chapters.
>> > Some of these problems can be cleared up by macros.
>> > David IB
>> > At 00:43 15/03/2011, Robert Slowley wrote:
>> >> I think I can produce a better text to produce something which has
>> less to
>> >> correct.
>> > What do you mean?
>> >> It would be useful to have transliterated Hebrew and a single-word
>> >> instead of the numbers.
>> > I have an electronic copy of the stuff you get on popups on
>> > for Strongs already - which I was planning to integrate. If the
>> > numbers are replaced with 'transliterated Hebrew' or a 'single-word
>> > meaning' what specifically would that mean?
>> > For instance on
>> > for the strongs reference h03651, which is the transliterated hebrew,
>> > and which is the single word meaning?
>> >> It would be useful to divide the top line by the tagging, not by any
>> >> English
>> >> parsing
>> >> eg Gen.1.30 || and to every thing (h3605 )||
>> >> instead of || and to every (h3605) || thing (h3605 ) ||
>> > In the case of Genesis 1:30 the text behind it is:
>> > NASB: ... <w H3605>and to every</w> <w H3605>thing</w> ...
>> > Presumably there is a reason for the text to have two separate sets of
>> > words both tagged individually with H3605? Or is it just a markup
>> > error?
>> > Presumably in some cases it words should be merged if they have the
>> > same strongs and are next to each other, but in other cases, this
>> > isn't the case, e.g. Isa 6:3
>> > Has:
>> > <w H6918>Holy</w>, <w H6918>Holy</w>, <w H6918>Holy</w>, is the <w
>> > H3068>Lord</w> <w H6635>of hosts</w>
>> > because the Hebrew has swdq repeated 3 times, and I assume that the
>> > reader who understands Strong's gets this indication by it being
>> > repeated rather than there being <w H6918>Holy, Holy, Holy</w>. Is
>> > that right?
>> >> It might be better to have the bottom line with a separate box for
>> >> word. Sometimes we will want to divide things up differently
>> > As I see it we have 'phrases' (a set of one or more words) which may
>> > have one or more strongs references. In some cases a set of words with
>> > have a shared strongs reference, but in other cases like Isa 6:3 sets
>> > of contiguous words may have the same strongs references but still be
>> > separate 'phrases'. As I see it there's no automatically working this
>> > out.
>> > What I was thinking was to have some algorithm that tries to
>> > automatically map the NASB strongs annotations on to the ESV text,
>> > similar to what I have already crudely done here. That can either try
>> > to group things as the NASB does (where a set of contiguous words
>> > share a strongs reference), or do what I have done here (which is
>> > easier) which is to automatically group words in to a 'phrase' where
>> > they share the same strongs references.
>> > Either way not all of the ESV can be automatically annotated in this
>> > way, the annotation will be wrong in some cases, and the automated
>> > grouping may be wrong in some cases. So I was thinking of making the
>> > interface such that once the automated grouping has been attempted the
>> > end user can click on a box which will make it selected, then click on
>> > the next box to the left or right (and so on), when this is done a
>> > button for "merging in to a phrase" would appear - then if this is
>> > clicked they would be made in to a phrase and could have their strongs
>> > references assigned. Alternatively clicking on a box that represents a
>> > phrase of one or more words will cause a "demerge" button to appear
>> > that will separate out all the words. This will allow the end user to
>> > handle both types of situation.
>> > I also thought some sort of "This verse is tagged correctly" button
>> > would be good. In some cases the program will annotate everything, but
>> > it will still need to be checked by a human - and a human may wish
>> > their annotation to be checked by someone else for quality purposes.
>> > When a verse is marked as correct, it can have a tick or something,
>> > and there can be a page of "verses that need work" which it would
>> > automatically be removed from. Does that sound sensible?
>> > We have easy access to the SBLGNT (with apparatus) and Leningrad
>> > Codex. Is it worthwhile including those for each verse? I don't know
>> > what process an annotator would go through, and what level of
>> > knowledge of the original languages they would use.
>> > I worked a bit today on tidying up the classes I've written, and
>> > improving the processing of the text (in the next few weeks I'll send
>> > you a list of the suspicious stuff I found while processing your files
>> > ;-) ). I'm away next week for my 1st year's anniversary holiday - but
>> > after that can start work on making this in to an actual web app that
>> > would be useful rather than a static web page demo of the sort of
>> > thing I had in mind.
>> > Any thoughts / comments / ideas appreciated!
>> > It'd probably be a good idea to see if we can improve the automatic
>> > annotation of the ESV from the NASB if we can, as any progress made
>> > here before people start manually annotating / checking will reduce
>> > the amount of man hours needed to complete the task.
>> > -Rob
>> > --
>> > http://www.slowley.com/
>> > "On two occasions, I have been asked [by members of Parliament],
>> > 'Pray, Mr. Babbage, if you put into the machine wrong figures, will
>> > the right answers come out?' I am not able to rightly apprehend the
>> > kind of confusion of ideas that could provoke such a question."
>> > -- Charles Babbage (1791-1871)
>> "On two occasions, I have been asked [by members of Parliament],
>> 'Pray, Mr. Babbage, if you put into the machine wrong figures, will
>> the right answers come out?' I am not able to rightly apprehend the
>> kind of confusion of ideas that could provoke such a question."
>> -- Charles Babbage (1791-1871)
> tyndale-devel mailing list
> tyndale-devel at crosswire.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the tyndale-devel