[jsword-devel] Diff-ing, casing and punctuation

Chris Burrell chris at burrell.me.uk
Sun Nov 3 09:45:48 MST 2013


Hello

I'd like to add case insensitivity and ignoring of punctuation as part of
our BookData Diff-ing tool. I've had a brief look through it but it seems
to be quite complex, and there does not seem to be an easy place to insert
such a check.

The diffs seem to be done at various points during the system and there
does not seem to be an obvious place to add a flag for ignoring
punctuation/casing of words. This often results in very trivial/minor
changes that are not changes in meaning.

I'm not sure what the best approach is, but found that this particular diff
tool: https://code.google.com/p/java-diff-utils allows you to specify an
Equalizer.

I had previously added an 'unaccented' to ignore accents. This works fine,
but results in the displayed text having no accents. The ideal tool would
be something that compared texts in both XML fragments and marked up the
XML rather than generating new elements.

So I was wondering if it would be worth exploring other diff-ing engines as
well?
Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20131103/d574a953/attachment.html>


More information about the jsword-devel mailing list