[sword-devel] Where is the line between translation and publication?

Thu Feb 17 22:17:54 MST 2005

I now wax philosophical, exploring the reasons and motivations behind
some of the work I do.

There are several reasons we use standard text-based Scripture file
interchange standards in Bible translation work:
* Longevity of the format
* Ease of conversion to other formats
* Portability of the Scripture files between entities
* Separation of logical markup from physical markup to make publication
in multiple formats easier

Consider, for example, a Bible translation completed in 1985 and due for
revision. Would you rather start with a WordStar document with all of
its manual formatting, or an equivalent SFM file? After a period of
transition, with USFM and an XML Scripture standard in peaceful
coexistence, I would expect that most USFM Scripture files could be
converted to an XML format, and those should be accessible for a long
time, provided the schemas used are well-designed. Just any XML isn't
good enough. You can take anything you can import to Microsoft Word, and
save it in WordML format, and it is XML, but not necessarily useful XML.
Enough said about longevity.

Both SFM and XML are easy to parse with computer programs and convert to
other formats or do other processing, such as consistency checks. XML is
easier with commercial and open source tools, and XSLT only works with
XML sources.

Until recently, "Standard Format" wasn't so standard, so tags used in
one entity weren't always what were used in another. I'm pleased with
USFM as a solution. Likewise, OSIS is an attempt to unify several
incompatible efforts to represent Scripture texts in XML. Both aim to
make sure that Scripture files produced by one entity can be used by
another.

I am interested in your opinions: where exactly is the line between
physical and logical markup. By logical markup, I mean marking things
like chapter and verse starting points with markup that means "Chapter 3
starts here," "Chapter 16 starts here," "This text is a section title".
Physical markup would say something more like "render verse numbers as
6-point superscript numbers in bold Arial font" or "render section
titles as 10-point Arial font, centered, 10 points before this line, 5
points after this line, keep with next paragraph." Different physical
markup can be easily applied to the exact same logically-marked text to
produce different publication products, such as a pocket New Testament,
a large print edition for the visually impaired, a "normal" sized Bible,
an HTML Bible, a module for a Bible study program on a Palm Pilot, or a
Braille edition. USFM is logical markup. OSIS is supposed to be logical
markup. So far, so good. That is why you find no way in USFM to set font
sizes of the various kinds of text, but you can specify tags that
translate to styles in Microsoft Word, Ventura Publisher, or whatever,
and in the publication process you can control the typography in great
detail.

I have seen, however, circumstances where the lines between logical and
physical markup get blurred, as well as the lines between the
responsibility of the translators vs. the responsibility of the
programmers and typesetters. One non-fictional example involves a Bible
study software vendor (who shall remain nameless in this email to
protect the guilty party). He had a strong preference for rendering
certain proper names in the Holy Bible in certain ways, especially God's
Proper Name in the Old Testament. Knowing that others had strong
preferences, but not necessarily the same preferences, he generously
provided a facility in his program by which it would perform word
substitutions on-the-fly in the Bible text as displayed. That way, no
matter if you preferred to see the Most Holy Name in its original 4
Hebrew letters, an English transliteration of the same, or as a
translation beginning with either "Y" or "J", you could. You could
change "Jesus" to "Joshua" or "Yeshua". Furthermore, if you thought the
Holy Bible in the translations you were reading was just too sexist, you
could change all instances of "Father" to "Parent" and "brother" to
"sibling". Indeed, you could replace "hell" with "party place" if you
had no fear of God. Or, you could do more mundane things like replace
"Esaias" with "Isaiah" in the KJV. Would you call this clever
programming a feature or a bug? Should publishers (including Bible
translation software writers) have the right to make such changes in the
Bible translation text? Copyright law notwithstanding, is it morally
right? I would say that publishers and programmers have no business
changing the text of a translation, unless they are fully authorized to
make derivative works from the base translation and ready to take full
responsibility before God for their translation work.

What about quotation marks?

There are at least three schools of thought on quotation marks. I would
like to explore these a little more and see what you think. I hope to
generate more light than heat. May God grant it to be so. :-)

Position #1: Quotation punctuation marks are a language-dependent part
of the Bible translation, and should be placed by the translators, and
not by programmers or typesetters. Quotation marks should be placed in
the text in the same way as other punctuation, like periods and commas.
This is the position implicit in the USFM 2.0 standard, which has no
markers for starting and ending quotation marks. (Unofficial extensions
allow some keyboarding shortcuts, assuming that these will be converted
to the proper typographic quotation mark characters.) USFM 2.0 does
support character styles for OT (or other) quotes (\qt ...\qt*) and for
Words of Jesus (\wj ...\wj*), but those are optional character styles to
be strictly nested within paragraph styles, and do not indicate
beginnings and ends of quotations. The example given for \qt usage shows
quotation marks used together with the markup.

Position #2: Quotation punctuation marks are not really part of the
Bible translation text, but like fonts, are a feature to be rendered in
whatever way seems good for the output format at hand, based on some
external style sheet and markup that indicates that quotations are being
made. Markup should indicate where quotations start and stop, but the
punctuation associated with those quotes should always be generated on
the fly and never included in the text. See the instructions to this
effect at
http://www.bibletechnologies.net/OSISUserManual21draft.dsp#osisusermanual_.26-div-d0e2881, 

where they give a clear example of replacing quotation marks with <q>
markup. (The same sort of thing is said in the <speech> tag
documentation in the same document.) It is up to the discretion of the
renderer (i. e., programmers, typesetters, and publishers) how quotation
punctuation is to be rendered, and Bible translators should not have a
say in how this is done for their language, or if they do, it should be
done with a separate style specification of some sort, and not in the
Scripture markup source.

Position #3: Quotation punctuation marks are a language-dependent part
of the Bible translation, but it can be helpful to translators to
initially generate the punctuation automatically from rules, as long as
the translator is free to specify exceptions to those rules. A good
Scripture file interchange standard allows the translator to specify
where the punctuation goes unambiguously for any language, translation,
or style, without reliance on any external style sheet. A good Scripture
file interchange standard may also (and probably should) allow for
generation of quotation punctuation automatically for the following
three cases: (a) initial quotation punctuation generation, (2)
specification of a source text that can be rendered optionally as a list
of verses, where each verse gets open quote reminders (like the NASB),
or as normal poetry and prose, where each paragraph or stanza gets its
own open quote reminders, and (3) adjustment of quotation punctuation on
extraction of a quote from Scripture for inclusion in another text in an
automated setting, where normal English punctuation rules apply. Any
rules-based quotation generation system must allow for such details as
quotation mark nesting level limitation and equivalence of a block inset
without quote marks to a quote mark-delimited section of text. A
quotation mark that was automatically generated can be tagged (like
maybe <generated>≪</generated>) in such a way that such quotation marks
can be stripped out and replaced when regenerated, something you might
want to do after changing the way paragraphs are divided or editing
quotation markup. Renderers not knowing the rules of generation can
simply copy the correct punctuation from the generated element. It would
be easy to make minimal modifications to OSIS to allow this view. If I
still cared if OSIS succeeded as a standard or not, I would definitely
want this to happen.

If position #3 is used as the philosophical basis for a Scripture
interchange schema, then the schema and specifications can be made to
reasonably accommodate texts prepared by people who religiously hold to
positions #1 or #2. However, the fundamental incompatibility between
positions #1 and #2 currently prevent fully automatic, lossless
conversion between USFM and OSIS. I am aware of claims to the contrary,
but I remain skeptical based on the current USFM and OSIS documentation,
publicly available on the web today. If you never use \qt, \wj, <q/>, or
<speech/>, then you can probably avoid the whole issue sufficiently to
make a passable solution (albeit one that irritates people of position #2).

Am I being too picky about quotation marks? After all, they aren't used
in the original Greek and Hebrew, and they are just little "jots" and
"tittles". Quotation marks are part of the translated target languages,
with their type and position inferred from the original texts, even if
they aren't in the original texts as punctuation. Different translations
don't always use the same punctuation marks, and they don't always have
the same open quote reminder rules. Indeed, the variations even among
English translations are broad enough and full of so many exceptions to
convince me that I don't want to program all of them.

-- 
Kahunapule Michael P. Johnson
http://Kahunapule.org