[sword-devel] V11n validation (was Re: Change to Synodal verse system)

Chris Little chrislit at crosswire.org
Mon Aug 1 08:16:33 MST 2011



On 8/1/11 7:03 AM, DM Smith wrote:
> David's observation has got me thinking on whether there is a way to
> detect mismatches. The nature of osis2mod is to be lossless with regard
> to biblical material. If a verse in the OSIS file is not in the v11n,
> then it is appended it to the prior verse entry (it is a bit more
> complicated than that, but it gives the idea).
>
> I think a statistical analysis of a text could find such verses. Maybe a
> comparison of the word count per verse of a Greek text for the NT and a
> Hebrew text for the OT could serve as the expected. For a translation,
> it's word count would be compared to the reference. I would guess the
> ratio of words in the original compared to the translation would be
> fairly tight. Anything differing significantly from the ratio (perhaps
> 1.5x standard deviation of the ratio) would be flagged.
>
> Since the problem is potentially found only in the last verse of a
> chapter, there could be a flag to report only those. (Analysis would
> probably need to be done on the entire text to get a fair ratio and
> standard deviation.)
>
> I'm sure that there are problems with such an idea. And I'm not sure
> whether it would serve much value beyond checking those with a KJV v11n.
>
> In Him,
> DM

Modules that were re-versified by Sword tools are rather easy to 
identify and then convert back to their native versification, since we 
include identification (within the text) of the concatenated verses and 
their original identity.

The excerpted Ukrainian text doesn't include these (unless David 
happened to remove them before posting), so there's not much chance of 
our being able to export the text, return it to its native 
versification, and re-import it, since the text came to us in a 
KJV-versified format.

--Chris



More information about the sword-devel mailing list