[sword-devel] ESV Markup Challenge

Greg Hellings greg.hellings at gmail.com
Thu Sep 11 17:56:01 MST 2008


Sorry, I attached a version of the tarball that had the executable in
it and the list moderation caught it.  Here's the cleaned version.
See the detailed summary below.

--Greg

On Thu, Sep 11, 2008 at 7:52 PM, Greg Hellings <greg.hellings at gmail.com> wrote:
> Troy,
>
> The task that I'm currently working on as research for my dissertation
> can possibly be leveraged.  We are attempting to sort out image
> annotations (in an effort to learn how to automatically create them).
> As such, we are given a list of terms which annotate the contents of
> an image - but we want to know how similar the semantics of some of
> the terms are.  Here is where I think parallels can be drawn:
>
> We use established semantic relatedness measurement techniques (see
> wn-similarity.sourceforge.net for some of the best tools currently
> available for that) to construct a graph connecting each term with all
> the other annotating terms, where the edge weight of the graph is the
> value of the average over all of the semantic measures that the
> WordNet Similarity measure returns (in time we will take a weighted
> average with all the values normalized between [0..1], since some
> measures only scale from [0..1/2] and others can take values up to
> 16,000 and more).  We then do some strange graph partitioning tricks,
> etc -- that's someone else's domain.
>
> However, you could possibly utilize the following modification of the
> technique.  For each term in the ESV, find the similarity between it
> and every term in the KJV.  If they are identical, set the value to 1,
> otherwise, use the WordNet::Similarity tools to produce a value.  Then
> weight the value of the link by their relative positions in the text
> (that way two occurrences of the same term can be differentiated), for
> example, divide by abs(position(ESV) - position(KJV)) or something
> similar.  Then assign the value for each term based on the word that
> it most closely resembles.
>
> This is very similar to what you're already doing, but not identical.
> I have modified the esvtag.cpp to use the included similarity.py to
> get the semantic distance from a few of the metrics that
> WordNet::Similarity uses (however, it scrapes a webpage to do so - you
> will do better, if you decide to use this system, to install the local
> Perl data and run the system locally) whenever the terms are not
> identical.  It continues to work for Gen 1:1, the program pegs out my
> processor and does not appear to have any intention of completing Gen
> 1:2 -- I don't know where the fault for that lies, but it does that
> both in your original version and in this version.  Obviously, the
> weighting I proposed would work best when the version being used
> maintains very similar phrase ordering and structuring to the KJV, but
> I suppose any metric we use will require human supervision anyway.
>
> As a bonus, I also have it sticking contiguous terms which are part of
> the same source -- "In the beginning" -- into the same <w> tag.
>
> --Greg
> P.S. The attached tarball will clobber any current esvtag directory
> that's a child of where you unpack it - so be careful about that.
>
> On Thu, Sep 11, 2008 at 4:02 PM, Troy A. Griffitts <scribe at crosswire.org> wrote:
>> Hey guys.  I have a fun and useful challenge for anyone wishing to show off
>> their prowess at problem solving and basic world domination.
>>
>>
>> We have morphological data for the KJV.  Lots of work by many people went
>> into this data, to markup each English word in the Bible text to the
>> corresponding Hebrew or Greek word in the original text.
>>
>>
>> We have many other Bibles with /similar/ wording to the KJV which are not
>> yet marked up.
>>
>>
>> Lane Dennis from Crossway (ESV publishers) is here at Tyndale House visiting
>> and we've talked in the past about helping them markup their ESV text to the
>> original.
>>
>>
>> I have done most all of the grunt work for you!
>>
>> Attached is source for a program which attempts to insert <w> markup into
>> the ESV markup using the KJV data.
>>
>> It is HEAVILY commented, requires latest SVN of the SWORD engine INSTALLED
>> on your system, both the KJV and ESV modules INSTALLED, and has an nice
>> little method:
>>
>> void matchWords(...)
>>
>> where you're given:
>> a word list from ESV
>> a word list from KJV
>> a map from KJV word to an XMLTag "<w...>"
>>
>> and all you have to do is fill out the equivalent:
>> map from ESV word to an XMLTag.
>>
>>
>> As a sample, it current has a really silly algorithm that actually works for
>> Gen.1.1, so you have an example of the work you need to do.
>>
>> All you have to do is add the real magic that figures out which words in the
>> ESV map to which words in the KJV (well, you get the idea).
>>
>> Have fun!  And I'm sure you can see where this is going and how useful it
>> can be for future work!
>>
>>
>>        -Troy.
>>
>>
>>
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: esvtag.tar.gz
Type: application/x-gzip
Size: 4978 bytes
Desc: not available
Url : http://www.crosswire.org/pipermail/sword-devel/attachments/20080911/1ec2ef69/attachment.gz 


More information about the sword-devel mailing list