<div dir="ltr">The reason it doesn't work on Genesis 1:2 is because it doesn't find a word with enough similarity, so it ends up in an infinite loop within a TODO block :)<br><br>Also, there is the ESV English-Greek Reverse Interlinear New Testament (<a href="http://www.crossway.org/product/158134628X">http://www.crossway.org/product/158134628X</a>) already, which is the same sort of thing (NT only, obviously).<br>
<br clear="all">God Bless,<br>Ben<br>-------------------------------------------------------------------------------------------<br>The Lord is not slow to fulfill his promise as some count slowness,<br>but is patient toward you, not wishing that any should perish,<br>
but that all should reach repentance.<br>2 Peter 3:9 (ESV)<br>
<br><br><div class="gmail_quote">On Fri, Sep 12, 2008 at 10:56 AM, Greg Hellings <span dir="ltr"><<a href="mailto:greg.hellings@gmail.com">greg.hellings@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Sorry, I attached a version of the tarball that had the executable in<br>
it and the list moderation caught it. Here's the cleaned version.<br>
See the detailed summary below.<br>
<br>
--Greg<br>
<br>
On Thu, Sep 11, 2008 at 7:52 PM, Greg Hellings <<a href="mailto:greg.hellings@gmail.com">greg.hellings@gmail.com</a>> wrote:<br>
> Troy,<br>
><br>
> The task that I'm currently working on as research for my dissertation<br>
> can possibly be leveraged. We are attempting to sort out image<br>
> annotations (in an effort to learn how to automatically create them).<br>
> As such, we are given a list of terms which annotate the contents of<br>
> an image - but we want to know how similar the semantics of some of<br>
> the terms are. Here is where I think parallels can be drawn:<br>
><br>
> We use established semantic relatedness measurement techniques (see<br>
> <a href="http://wn-similarity.sourceforge.net" target="_blank">wn-similarity.sourceforge.net</a> for some of the best tools currently<br>
> available for that) to construct a graph connecting each term with all<br>
> the other annotating terms, where the edge weight of the graph is the<br>
> value of the average over all of the semantic measures that the<br>
> WordNet Similarity measure returns (in time we will take a weighted<br>
> average with all the values normalized between [0..1], since some<br>
> measures only scale from [0..1/2] and others can take values up to<br>
> 16,000 and more). We then do some strange graph partitioning tricks,<br>
> etc -- that's someone else's domain.<br>
><br>
> However, you could possibly utilize the following modification of the<br>
> technique. For each term in the ESV, find the similarity between it<br>
> and every term in the KJV. If they are identical, set the value to 1,<br>
> otherwise, use the WordNet::Similarity tools to produce a value. Then<br>
> weight the value of the link by their relative positions in the text<br>
> (that way two occurrences of the same term can be differentiated), for<br>
> example, divide by abs(position(ESV) - position(KJV)) or something<br>
> similar. Then assign the value for each term based on the word that<br>
> it most closely resembles.<br>
><br>
> This is very similar to what you're already doing, but not identical.<br>
> I have modified the esvtag.cpp to use the included similarity.py to<br>
> get the semantic distance from a few of the metrics that<br>
> WordNet::Similarity uses (however, it scrapes a webpage to do so - you<br>
> will do better, if you decide to use this system, to install the local<br>
> Perl data and run the system locally) whenever the terms are not<br>
> identical. It continues to work for Gen 1:1, the program pegs out my<br>
> processor and does not appear to have any intention of completing Gen<br>
> 1:2 -- I don't know where the fault for that lies, but it does that<br>
> both in your original version and in this version. Obviously, the<br>
> weighting I proposed would work best when the version being used<br>
> maintains very similar phrase ordering and structuring to the KJV, but<br>
> I suppose any metric we use will require human supervision anyway.<br>
><br>
> As a bonus, I also have it sticking contiguous terms which are part of<br>
> the same source -- "In the beginning" -- into the same <w> tag.<br>
><br>
> --Greg<br>
> P.S. The attached tarball will clobber any current esvtag directory<br>
> that's a child of where you unpack it - so be careful about that.<br>
<div><div></div><div class="Wj3C7c">><br>
> On Thu, Sep 11, 2008 at 4:02 PM, Troy A. Griffitts <<a href="mailto:scribe@crosswire.org">scribe@crosswire.org</a>> wrote:<br>
>> Hey guys. I have a fun and useful challenge for anyone wishing to show off<br>
>> their prowess at problem solving and basic world domination.<br>
>><br>
>><br>
>> We have morphological data for the KJV. Lots of work by many people went<br>
>> into this data, to markup each English word in the Bible text to the<br>
>> corresponding Hebrew or Greek word in the original text.<br>
>><br>
>><br>
>> We have many other Bibles with /similar/ wording to the KJV which are not<br>
>> yet marked up.<br>
>><br>
>><br>
>> Lane Dennis from Crossway (ESV publishers) is here at Tyndale House visiting<br>
>> and we've talked in the past about helping them markup their ESV text to the<br>
>> original.<br>
>><br>
>><br>
>> I have done most all of the grunt work for you!<br>
>><br>
>> Attached is source for a program which attempts to insert <w> markup into<br>
>> the ESV markup using the KJV data.<br>
>><br>
>> It is HEAVILY commented, requires latest SVN of the SWORD engine INSTALLED<br>
>> on your system, both the KJV and ESV modules INSTALLED, and has an nice<br>
>> little method:<br>
>><br>
>> void matchWords(...)<br>
>><br>
>> where you're given:<br>
>> a word list from ESV<br>
>> a word list from KJV<br>
>> a map from KJV word to an XMLTag "<w...>"<br>
>><br>
>> and all you have to do is fill out the equivalent:<br>
>> map from ESV word to an XMLTag.<br>
>><br>
>><br>
>> As a sample, it current has a really silly algorithm that actually works for<br>
>> Gen.1.1, so you have an example of the work you need to do.<br>
>><br>
>> All you have to do is add the real magic that figures out which words in the<br>
>> ESV map to which words in the KJV (well, you get the idea).<br>
>><br>
>> Have fun! And I'm sure you can see where this is going and how useful it<br>
>> can be for future work!<br>
>><br>
>><br>
>> -Troy.<br>
>><br>
>><br>
>><br>
>><br>
</div></div>>> _______________________________________________<br>
>> sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br>
>> <a href="http://www.crosswire.org/mailman/listinfo/sword-devel" target="_blank">http://www.crosswire.org/mailman/listinfo/sword-devel</a><br>
>> Instructions to unsubscribe/change your settings at above page<br>
>><br>
><br>
<br>_______________________________________________<br>
sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br>
<a href="http://www.crosswire.org/mailman/listinfo/sword-devel" target="_blank">http://www.crosswire.org/mailman/listinfo/sword-devel</a><br>
Instructions to unsubscribe/change your settings at above page<br></blockquote></div><br></div>