<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">You might consider using CollateX,
which does token level (word or other) collation, and does a
pretty good job detecting things like transpositions, etc. Here
is how we use it here at the INTF:<br>
<br>
<a class="moz-txt-link-freetext" href="http://ntvmr.uni-muenster.de/web/test/collation?key=Jn.3.16&collate=graph">http://ntvmr.uni-muenster.de/web/test/collation?key=Jn.3.16&collate=graph</a><br>
<br>
Our web service for this is here (with example parameters
following):<br>
<br>
<a class="moz-txt-link-freetext" href="http://ntvmr.uni-muenster.de/community/vmr/api/collate/">http://ntvmr.uni-muenster.de/community/vmr/api/collate/</a><br>
<a class="moz-txt-link-freetext" href="http://ntvmr.uni-muenster.de/community/vmr/api/collate/?w1=Hello+world&l1=x&w2=Hello+cruel+world&format=svg">http://ntvmr.uni-muenster.de/community/vmr/api/collate/?w1=Hello+world&l1=x&w2=Hello+cruel+world&format=svg</a><br>
<br>
<br>
<br>
<br>
On 08/29/2012 06:50 PM, Chris Burrell wrote:<br>
</div>
<blockquote
cite="mid:CACQnaRVwULCn5_mgBsWJ6-hGgw1eE8KccRzugAnRVBJYNsC=og@mail.gmail.com"
type="cite">Hi all
<div><br>
</div>
<div>The current diffing produces some fairly strange results from
time to time. I was wondering how much work it would be to make
it work for a word by word diff, rather than letter by letter.
I've a quick scan through the diff-ing engine, but it looks
fairly complicated and can't figure out how much of this is a
copy of <a moz-do-not-send="true"
href="http://code.google.com/p/google-diff-match-patch">http://code.google.com/p/google-diff-match-patch</a>
and how much has changed.</div>
<div><br>
</div>
<div>In the example below, </div>
<div>
<table class="table">
<tbody>
<tr class="row">
<td dir="ltr" class="cell" valign="top"><br>
"And God saw th<u>at th</u>e light <font
class="strike"><b>, that it was good : and God divid</b></font><u>was
good. And God separat</u>ed the light from the
darkness<font class="strike"> </font> "<br>
<br>
The new diff would hopefully not chop "that and "the"
in the first occurrence above. It would not chop
"divid" off either, but rather have longer words, which
would in turn make things slightly more readable.<br>
<br>
</td>
</tr>
</tbody>
</table>
</div>
<div>(bold indicates strike through)</div>
<div><br>
</div>
<div>Chris</div>
<div><br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
jsword-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:jsword-devel@crosswire.org">jsword-devel@crosswire.org</a>
<a class="moz-txt-link-freetext" href="http://www.crosswire.org/mailman/listinfo/jsword-devel">http://www.crosswire.org/mailman/listinfo/jsword-devel</a>
</pre>
</blockquote>
<br>
</body>
</html>