[sword-devel] usfm2osis.py and tag \cp
Chris Little
chrislit at crosswire.org
Fri Oct 12 12:56:08 MST 2012
On 10/12/2012 4:00 AM, Peter von Kaehne wrote:
> Sorry, while the crash has gone, the function is not correct - at
> all.
>
> \cp is meant to give a printed chapter number which has no influence
> on the underlying counting of verses and chapters. How exactly to
> represent it in OSIS, we would need to figure out, but it should not
> influence the creation of subsequent osisIDs. I would think <hi
> type="bold"> is probably the best for our purposes. The OSIS
> reference is not exactly helpful at this point, nor does it reflect
> the reality of module making.
\cp (like \vp) is a workaround for a limitation in Paratext. Paratext
requires that all chapter and verse numbers be numeric and strictly
increasing. No lettered or out-of-order or repeated verse or chapter
numbers are permissible. However, actual Bibles sometimes include these
things. So Paratext requires that you enumerate the chapters/verses with
strictly increasing numerals. \cp and \vp let Paratext substitute the
correct underlying number when rendering.
The description of \cp in the USFM docs states: "This is a chapter
marker (number, letter) which would be displayed in the published text
(where the published marker is different than the \c # used within the
translation editing environment)." The words "translation editing
environment" are a reference to Paratext specifically, and the
description as a whole conveys that \cp is the real chapter number if a
different \c value is necessitated by Paratext.
OSIS doesn't have this limitation. You can encode the real verse and
chapter numbers in OSIS, without need for a workaround.
So usfm2osis.py's replacement of the numeric dummy-chapter with the
chapter number specified in \cp is correct.
If you look at your USFM document, I anticipate you see something like:
\c 1
\cp A
...
\c 2
\cp 1
...
\c 3
\cp 2
...
\c 4
\cp 3
...
\c 5
\cp B
...
\c 6
\cp 3
...
\c 7
\cp 4
The strictly increasing \c values are just dummy values for Paratext.
The \cp values represent the actual underlying chapter numbers for this
reference scheme. There aren't two different chapter 3s in Esther, just
one that is briefly interrupted by chapter B, but Paratext can't deal
with the underlying reference system, so it requires the \cp workaround.
Likewise, chapter 4 (\cp 4) isn't really chapter 7 (\c 7).
This is mostly based on my experience encoding USX docs for ABS. If your
USFM encoder intends that the value in \c be the chapter value, then \cp
should not be used. You should look into \ca or \cl as alternatives.
> Right now the code does two things: It replaces in the sample below
> the chapter number 1 with an A for the subsequent verse's osisID
> ("Esth.A.1" instead of "Esth.1.1") and it leaves the \cp A in place.
> This is both not right - both acc OSIS reference and acc the desires
> of the USFM writer in my example.
With the update just committed, usfm2osis.py should now correctly remove
\cp (and \vp). That was a bug--actually a set of bugs. Again, I
regrettably haven't tested this, but the code looks good to me.
--Chris
More information about the sword-devel
mailing list