[sword-devel] ESV Markup Challenge
Troy A. Griffitts
scribe at crosswire.org
Fri Sep 12 03:10:19 MST 2008
Hey Ben and Greg,
Thanks for your input. I might have time to look into Greg's
recommendation. Now that really sounds like an impressive way to choose
similar terms. And thanks for completing the insertWordTags to put
consecutive words together! Ben, thanks for finding where the program
locks up! I'll try to have a look, if no one gets to it before me. The
Interlinear NT you point out has an interesting story. Apparently
Crossway paid the Logos guys to mark up their NT for them and have
license the data back from Logos. This keeps Crossway from being able
to give that data to us or any other freely distributable project. We'd
like to help them with better data under a better license :)
Thanks everyone for your feedback and interest. Please keep working on
things. This is 1 of like 5 tracks of work on my whiteboard so by all
means, keep working! :)
-Troy.
Ben Morgan wrote:
> The reason it doesn't work on Genesis 1:2 is because it doesn't find a
> word with enough similarity, so it ends up in an infinite loop within a
> TODO block :)
>
> Also, there is the ESV English-Greek Reverse Interlinear New Testament
> (http://www.crossway.org/product/158134628X) already, which is the same
> sort of thing (NT only, obviously).
>
> God Bless,
> Ben
> -------------------------------------------------------------------------------------------
> The Lord is not slow to fulfill his promise as some count slowness,
> but is patient toward you, not wishing that any should perish,
> but that all should reach repentance.
> 2 Peter 3:9 (ESV)
>
>
> On Fri, Sep 12, 2008 at 10:56 AM, Greg Hellings <greg.hellings at gmail.com
> <mailto:greg.hellings at gmail.com>> wrote:
>
> Sorry, I attached a version of the tarball that had the executable in
> it and the list moderation caught it. Here's the cleaned version.
> See the detailed summary below.
>
> --Greg
>
> On Thu, Sep 11, 2008 at 7:52 PM, Greg Hellings
> <greg.hellings at gmail.com <mailto:greg.hellings at gmail.com>> wrote:
> > Troy,
> >
> > The task that I'm currently working on as research for my
> dissertation
> > can possibly be leveraged. We are attempting to sort out image
> > annotations (in an effort to learn how to automatically create them).
> > As such, we are given a list of terms which annotate the contents of
> > an image - but we want to know how similar the semantics of some of
> > the terms are. Here is where I think parallels can be drawn:
> >
> > We use established semantic relatedness measurement techniques (see
> > wn-similarity.sourceforge.net
> <http://wn-similarity.sourceforge.net> for some of the best tools
> currently
> > available for that) to construct a graph connecting each term
> with all
> > the other annotating terms, where the edge weight of the graph is the
> > value of the average over all of the semantic measures that the
> > WordNet Similarity measure returns (in time we will take a weighted
> > average with all the values normalized between [0..1], since some
> > measures only scale from [0..1/2] and others can take values up to
> > 16,000 and more). We then do some strange graph partitioning tricks,
> > etc -- that's someone else's domain.
> >
> > However, you could possibly utilize the following modification of the
> > technique. For each term in the ESV, find the similarity between it
> > and every term in the KJV. If they are identical, set the value
> to 1,
> > otherwise, use the WordNet::Similarity tools to produce a value.
> Then
> > weight the value of the link by their relative positions in the text
> > (that way two occurrences of the same term can be
> differentiated), for
> > example, divide by abs(position(ESV) - position(KJV)) or something
> > similar. Then assign the value for each term based on the word that
> > it most closely resembles.
> >
> > This is very similar to what you're already doing, but not identical.
> > I have modified the esvtag.cpp to use the included similarity.py to
> > get the semantic distance from a few of the metrics that
> > WordNet::Similarity uses (however, it scrapes a webpage to do so
> - you
> > will do better, if you decide to use this system, to install the
> local
> > Perl data and run the system locally) whenever the terms are not
> > identical. It continues to work for Gen 1:1, the program pegs out my
> > processor and does not appear to have any intention of completing Gen
> > 1:2 -- I don't know where the fault for that lies, but it does that
> > both in your original version and in this version. Obviously, the
> > weighting I proposed would work best when the version being used
> > maintains very similar phrase ordering and structuring to the
> KJV, but
> > I suppose any metric we use will require human supervision anyway.
> >
> > As a bonus, I also have it sticking contiguous terms which are
> part of
> > the same source -- "In the beginning" -- into the same <w> tag.
> >
> > --Greg
> > P.S. The attached tarball will clobber any current esvtag directory
> > that's a child of where you unpack it - so be careful about that.
> >
> > On Thu, Sep 11, 2008 at 4:02 PM, Troy A. Griffitts
> <scribe at crosswire.org <mailto:scribe at crosswire.org>> wrote:
> >> Hey guys. I have a fun and useful challenge for anyone wishing
> to show off
> >> their prowess at problem solving and basic world domination.
> >>
> >>
> >> We have morphological data for the KJV. Lots of work by many
> people went
> >> into this data, to markup each English word in the Bible text to the
> >> corresponding Hebrew or Greek word in the original text.
> >>
> >>
> >> We have many other Bibles with /similar/ wording to the KJV
> which are not
> >> yet marked up.
> >>
> >>
> >> Lane Dennis from Crossway (ESV publishers) is here at Tyndale
> House visiting
> >> and we've talked in the past about helping them markup their ESV
> text to the
> >> original.
> >>
> >>
> >> I have done most all of the grunt work for you!
> >>
> >> Attached is source for a program which attempts to insert <w>
> markup into
> >> the ESV markup using the KJV data.
> >>
> >> It is HEAVILY commented, requires latest SVN of the SWORD engine
> INSTALLED
> >> on your system, both the KJV and ESV modules INSTALLED, and has
> an nice
> >> little method:
> >>
> >> void matchWords(...)
> >>
> >> where you're given:
> >> a word list from ESV
> >> a word list from KJV
> >> a map from KJV word to an XMLTag "<w...>"
> >>
> >> and all you have to do is fill out the equivalent:
> >> map from ESV word to an XMLTag.
> >>
> >>
> >> As a sample, it current has a really silly algorithm that
> actually works for
> >> Gen.1.1, so you have an example of the work you need to do.
> >>
> >> All you have to do is add the real magic that figures out which
> words in the
> >> ESV map to which words in the KJV (well, you get the idea).
> >>
> >> Have fun! And I'm sure you can see where this is going and how
> useful it
> >> can be for future work!
> >>
> >>
> >> -Troy.
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> sword-devel mailing list: sword-devel at crosswire.org
> <mailto:sword-devel at crosswire.org>
> >> http://www.crosswire.org/mailman/listinfo/sword-devel
> >> Instructions to unsubscribe/change your settings at above page
> >>
> >
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> <mailto:sword-devel at crosswire.org>
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
More information about the sword-devel
mailing list