[sword-devel] Fwd: Re: [sword-svn] [JIRA] (MODTOOLS-82) usfm2osis.py processing of USFM tables gave XML files that fail syntax check

David Haslam dfhmch at googlemail.com
Fri Jan 2 04:36:55 MST 2015


Hi Barry,

Not everything about USFM that's "wrong" is explicitly forbidden in the USFM
Reference.

It would be an impossible task to predict how USFM might be "misapplied",
especially by those who are NOT users of Paratext.

IMHO, to expect that of the USFM Reference is quite unreasonable.

These matters are more to do with training users - and the Bible Society
does lay on formal training courses for bona fide translators who are
registered to use Paratext.

Notwithstanding, I deduced that the ")" was the cause of the problem, it
being neither a number or a letter or combination of the two. The
description of the \vp_#\vp* marker pair does refer to "number or letter".

btw. The extra space after the ")" was not part of the issue.

Now for the hard stuff....

The conversion of such published verse numbers by the python script is much
more complex than in the earlier perl script. The latter just does this:

        # \vp...\vp# published verse numbers
	$line =~ s/\\vp\*\s*//g;
	$line =~ s/\\vp\b\s*(\d+[a-z]?|[a-z])\s*/<seg
type="verseNumber">$1<\/seg>/g;

In contrast, the python script takes 23 lines of code to perform the
conversion!

Even so, you can see from the Perl script code that it was expecting a
number with optional lowercase letters, etc. and that the output just goes
into a seg element. i.e. Becomes part of the verse text.

The Python script attempts something much more ambitious and demanding.
This remark in the code indicates what it's aiming to do.

"""Regex helper function to replace verse numbers from \v_# with values that
appeared in \vp_#\vp* and \va_#\va*, returing the verse text as a string.

It's therefore little wonder that a complex and repeated use of \vp_#\vp*
would lead to serious problems for module build.

Best regards,

David





--
View this message in context: http://sword-dev.350566.n4.nabble.com/Fwd-Re-sword-svn-JIRA-MODTOOLS-82-usfm2osis-py-processing-of-USFM-tables-gave-XML-files-that-fail-syk-tp4654500p4654501.html
Sent from the SWORD Dev mailing list archive at Nabble.com.



More information about the sword-devel mailing list