[jsword-devel] Comparing texts

Chris Burrell chris at burrell.me.uk
Sat Sep 1 07:20:55 MST 2012


Thanks for this. We've decided to stick with the letter options for now. As
it highlights the subtle differences between words like saith and said
rather well.

One thing I notice however, and I'm not sure how we would do this, is that
the diffing takes account of the accents in the original text. I'm guessing
there is no easy way to have that work out of the box, apart from changing
the OSIS returned by the call and amending it prior to the diff occurring.

Chris




On 29 August 2012 19:00, DM Smith <dmsmith at crosswire.org> wrote:

> It was based upon an earlier version of diff-match-patch, which was
> written in javascript, not java. The selection criteria I had was that it
> had to have a license compatible to JSword. When the original author was
> hired by google, the code changed to an incompatible license for porting.
> Since then it was ported to Java 5.
>
> I ported the earlier version to Java 1.4. But I broke it out into multiple
> classes. (We might be able to eliminate our version and use the google
> version directly).
>
> I think there is a way to have it do a word based match, but with code
> changes:
> http://code.google.com/p/google-diff-match-patch/wiki/LineOrWordDiffs
>
>
> On Aug 29, 2012, at 12:50 PM, Chris Burrell <chris at burrell.me.uk> wrote:
>
> Hi all
>
> The current diffing produces some fairly strange results from time to
> time. I was wondering how much work it would be to make it work for a word
> by word diff, rather than letter by letter. I've a quick scan through the
> diff-ing engine, but it looks fairly complicated and can't figure out how
> much of this is a copy of http://code.google.com/p/google-diff-match-patchand how much has changed.
>
> In the example below,
>
>            "And God saw th*at th*e light *, that it was good : and God
> divid**was good. And God separat*ed the light from the darkness          "
>
> The new diff would hopefully not chop "that and "the"  in the first
> occurrence above. It would not chop "divid" off either, but rather have
> longer words, which would in turn make things slightly more readable.
>
> (bold indicates strike through)
>
> Chris
>
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20120901/3d57f3a4/attachment.html>


More information about the jsword-devel mailing list