[sword-devel] ESV Markup Challenge

Ben Morgan benpmorgan at gmail.com
Thu Sep 11 21:02:35 MST 2008


The reason it doesn't work on Genesis 1:2 is because it doesn't find a word
with enough similarity, so it ends up in an infinite loop within a TODO
block :)

Also, there is the ESV English-Greek Reverse Interlinear New Testament (
http://www.crossway.org/product/158134628X) already, which is the same sort
of thing (NT only, obviously).

God Bless,
Ben
-------------------------------------------------------------------------------------------
The Lord is not slow to fulfill his promise as some count slowness,
but is patient toward you, not wishing that any should perish,
but that all should reach repentance.
2 Peter 3:9 (ESV)


On Fri, Sep 12, 2008 at 10:56 AM, Greg Hellings <greg.hellings at gmail.com>wrote:

> Sorry, I attached a version of the tarball that had the executable in
> it and the list moderation caught it.  Here's the cleaned version.
> See the detailed summary below.
>
> --Greg
>
> On Thu, Sep 11, 2008 at 7:52 PM, Greg Hellings <greg.hellings at gmail.com>
> wrote:
> > Troy,
> >
> > The task that I'm currently working on as research for my dissertation
> > can possibly be leveraged.  We are attempting to sort out image
> > annotations (in an effort to learn how to automatically create them).
> > As such, we are given a list of terms which annotate the contents of
> > an image - but we want to know how similar the semantics of some of
> > the terms are.  Here is where I think parallels can be drawn:
> >
> > We use established semantic relatedness measurement techniques (see
> > wn-similarity.sourceforge.net for some of the best tools currently
> > available for that) to construct a graph connecting each term with all
> > the other annotating terms, where the edge weight of the graph is the
> > value of the average over all of the semantic measures that the
> > WordNet Similarity measure returns (in time we will take a weighted
> > average with all the values normalized between [0..1], since some
> > measures only scale from [0..1/2] and others can take values up to
> > 16,000 and more).  We then do some strange graph partitioning tricks,
> > etc -- that's someone else's domain.
> >
> > However, you could possibly utilize the following modification of the
> > technique.  For each term in the ESV, find the similarity between it
> > and every term in the KJV.  If they are identical, set the value to 1,
> > otherwise, use the WordNet::Similarity tools to produce a value.  Then
> > weight the value of the link by their relative positions in the text
> > (that way two occurrences of the same term can be differentiated), for
> > example, divide by abs(position(ESV) - position(KJV)) or something
> > similar.  Then assign the value for each term based on the word that
> > it most closely resembles.
> >
> > This is very similar to what you're already doing, but not identical.
> > I have modified the esvtag.cpp to use the included similarity.py to
> > get the semantic distance from a few of the metrics that
> > WordNet::Similarity uses (however, it scrapes a webpage to do so - you
> > will do better, if you decide to use this system, to install the local
> > Perl data and run the system locally) whenever the terms are not
> > identical.  It continues to work for Gen 1:1, the program pegs out my
> > processor and does not appear to have any intention of completing Gen
> > 1:2 -- I don't know where the fault for that lies, but it does that
> > both in your original version and in this version.  Obviously, the
> > weighting I proposed would work best when the version being used
> > maintains very similar phrase ordering and structuring to the KJV, but
> > I suppose any metric we use will require human supervision anyway.
> >
> > As a bonus, I also have it sticking contiguous terms which are part of
> > the same source -- "In the beginning" -- into the same <w> tag.
> >
> > --Greg
> > P.S. The attached tarball will clobber any current esvtag directory
> > that's a child of where you unpack it - so be careful about that.
> >
> > On Thu, Sep 11, 2008 at 4:02 PM, Troy A. Griffitts <scribe at crosswire.org>
> wrote:
> >> Hey guys.  I have a fun and useful challenge for anyone wishing to show
> off
> >> their prowess at problem solving and basic world domination.
> >>
> >>
> >> We have morphological data for the KJV.  Lots of work by many people
> went
> >> into this data, to markup each English word in the Bible text to the
> >> corresponding Hebrew or Greek word in the original text.
> >>
> >>
> >> We have many other Bibles with /similar/ wording to the KJV which are
> not
> >> yet marked up.
> >>
> >>
> >> Lane Dennis from Crossway (ESV publishers) is here at Tyndale House
> visiting
> >> and we've talked in the past about helping them markup their ESV text to
> the
> >> original.
> >>
> >>
> >> I have done most all of the grunt work for you!
> >>
> >> Attached is source for a program which attempts to insert <w> markup
> into
> >> the ESV markup using the KJV data.
> >>
> >> It is HEAVILY commented, requires latest SVN of the SWORD engine
> INSTALLED
> >> on your system, both the KJV and ESV modules INSTALLED, and has an nice
> >> little method:
> >>
> >> void matchWords(...)
> >>
> >> where you're given:
> >> a word list from ESV
> >> a word list from KJV
> >> a map from KJV word to an XMLTag "<w...>"
> >>
> >> and all you have to do is fill out the equivalent:
> >> map from ESV word to an XMLTag.
> >>
> >>
> >> As a sample, it current has a really silly algorithm that actually works
> for
> >> Gen.1.1, so you have an example of the work you need to do.
> >>
> >> All you have to do is add the real magic that figures out which words in
> the
> >> ESV map to which words in the KJV (well, you get the idea).
> >>
> >> Have fun!  And I'm sure you can see where this is going and how useful
> it
> >> can be for future work!
> >>
> >>
> >>        -Troy.
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> sword-devel mailing list: sword-devel at crosswire.org
> >> http://www.crosswire.org/mailman/listinfo/sword-devel
> >> Instructions to unsubscribe/change your settings at above page
> >>
> >
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.crosswire.org/pipermail/sword-devel/attachments/20080912/f5afd2ec/attachment.html 


More information about the sword-devel mailing list