[jsword-devel] Comparing texts
Troy A. Griffitts
scribe at crosswire.org
Wed Aug 29 10:15:06 MST 2012
You might consider using CollateX, which does token level (word or
other) collation, and does a pretty good job detecting things like
transpositions, etc. Here is how we use it here at the INTF:
http://ntvmr.uni-muenster.de/web/test/collation?key=Jn.3.16&collate=graph
Our web service for this is here (with example parameters following):
http://ntvmr.uni-muenster.de/community/vmr/api/collate/
http://ntvmr.uni-muenster.de/community/vmr/api/collate/?w1=Hello+world&l1=x&w2=Hello+cruel+world&format=svg
On 08/29/2012 06:50 PM, Chris Burrell wrote:
> Hi all
>
> The current diffing produces some fairly strange results from time to
> time. I was wondering how much work it would be to make it work for a
> word by word diff, rather than letter by letter. I've a quick scan
> through the diff-ing engine, but it looks fairly complicated and can't
> figure out how much of this is a copy of
> http://code.google.com/p/google-diff-match-patch and how much has changed.
>
> In the example below,
>
> "And God saw th_at th_e light *, that it was good : and God
> divid*_was good. And God separat_ed the light from the darkness "
>
> The new diff would hopefully not chop "that and "the" in the first
> occurrence above. It would not chop "divid" off either, but rather
> have longer words, which would in turn make things slightly more readable.
>
> (bold indicates strike through)
>
> Chris
>
>
>
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20120829/c3a23fc2/attachment.html>
More information about the jsword-devel
mailing list