[jsword-devel] Comparing texts

Troy A. Griffitts scribe at crosswire.org
Wed Aug 29 10:15:06 MST 2012


You might consider using CollateX, which does token level (word or 
other) collation, and does a pretty good job detecting things like 
transpositions, etc.  Here is how we use it here at the INTF:

http://ntvmr.uni-muenster.de/web/test/collation?key=Jn.3.16&collate=graph

Our web service for this is here (with example parameters following):

http://ntvmr.uni-muenster.de/community/vmr/api/collate/
http://ntvmr.uni-muenster.de/community/vmr/api/collate/?w1=Hello+world&l1=x&w2=Hello+cruel+world&format=svg




On 08/29/2012 06:50 PM, Chris Burrell wrote:
> Hi all
>
> The current diffing produces some fairly strange results from time to 
> time. I was wondering how much work it would be to make it work for a 
> word by word diff, rather than letter by letter. I've a quick scan 
> through the diff-ing engine, but it looks fairly complicated and can't 
> figure out how much of this is a copy of 
> http://code.google.com/p/google-diff-match-patch and how much has changed.
>
> In the example below,
>
>            "And God saw th_at th_e light *, that it was good : and God 
> divid*_was good. And God separat_ed the light from the darkness         "
>
> The new diff would hopefully not chop "that and "the"  in the first 
> occurrence above. It would not chop "divid" off either, but rather 
> have longer words, which would in turn make things slightly more readable.
>
> (bold indicates strike through)
>
> Chris
>
>
>
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20120829/c3a23fc2/attachment.html>


More information about the jsword-devel mailing list