[osis-editors] suggested corrections

Michael Paul Johnson osis-editors@bibletechnologieswg.org
Thu, 08 Jan 2004 16:26:06 +1000


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


One suggested addition for section 29.1.1 of the OSIS 2.0.1 User's 
Manual:

HNV Hebrew Names Version of the World English Bible, also known as the 
World English Bible: Messianic Edition (WEBME)

(See http://eBible.org/hnv/ for the home of this translation.)

One comment on http://www.bibletechnologies.net/osisCore.2.0.xsd: I 
noticed that unlike the XSEM, which tries to dissect copyright and 
other rights information into little chunks like copyrightDate and 
holder, you just have a <rights> element that can hold a nice plain 
language description of rights (copyrights on various parts of the 
book, different dates and ranges, trademarks, permissions, public 
domain declarations, etc.). I think I like your approach better, 
although this mismatch is one of many things that makes lossless 
bidirectional transfers to and from other formats less than 
straight-forward. I suppose multiple rights elements with x-types 
corresponding to these little chunks could be used on texts converted 
from XSEM or other formats.

At 21:09 07-01-04, Patrick Durusau wrote:
>> Design issue:
>> 
>> I disagree with forcing the use of <q> or <speech> tags in place of 
>> quotation marks in Bible texts. This makes conversion of existing 
>> texts which have quotation marks in place more difficult. It also 
>> puts 
>> more of a burden on OSIS software in dealing with quotation marks 
>> for 
>> every language, and even differences in style within one language. 
>> For 
>> example, to use the <q> tags to properly render NASB text vs. NIV 
>> text, you would have to encode different rules in the software, but 
>> if 
>> you just use quotation marks, you can have the same software 
>> properly 
>> render both. Likewise, rules for continuing quotations through 
>> paragraph boundaries tend to vary from language to language, as do 
>> the 
>> characters used to mark quotations. Using <q> tags is a fine option 
>> when you want to encode rules, and for new Bible translation 
>> drafts, 
>> this is a good option to have. It has the potential of possibly 
>> reducing errors in quotation mark placement, if properly used. (I 
>> have 
>> found many such quotation mark errors in one published Bible in 
>> PNG). 
>> For existing texts, it is a pain. 
>
>Elements like <q> have a long history in markup in general, TEI in 
>particular (both Steve DeRose and I hail from that community), and 
>even 
>in the debates over the OSIS schema. Let me give you the quick take 
>on 
>why we came down on this issue as we did.
>
>The crux of the matter is contained in your statement: "For existing 
>texts, it is a pain." Quite so, but it also illustrates why going 
>with 
>elements and not inline text markers was the right choice.
>
>Most "existing texts" were written using what used to be called the 
>ISO 
>646 subset, that is ASCII characters that were considered "safe" for 
>transmission over the Internet. Works great, for some texts. The 
>problem 
>is that it works great only for "some" texts.
>
>If we did not compell the use of the <q> and similar elements, how do 
>we 
>distinguish between texts where similar quotes work and those where 
>it 
>doesn't?

Why would you want to? How would that be a benefit? Does anybody 
actually use the ability to programmatically recognize quotations for 
anything but generation of the correct punctuation marks and sometimes 
for highlighting Jesus' direct quotations?

If the purpose is to generate the correct quotation marks, then it is 
pointless to use <q> if the correct quotation marks are already in the 
text. If there is an alternate way to indicate the only two kinds of 
quotations that people tend to render differently within actual 
canonical text (Jesus' words and OT quotes in the NT), then <q> is 
totally unnecessary. However, I support its use for people who want to 
use those marks to generate punctuation where none exists already.

Would it not rather be superior to use the correct punctuation for 
each language? Indeed, you could use a <sentence> element instead of 
marking sentences with initial capitals and periods, but that doesn't 
work in all languages, and it is an unnecessary complication with no 
benefit. I don't see the <q> used to generate quotation marks as being 
any different. Your implementation of <q> is insufficient even to 
properly represent stylistic variations even of English texts, let 
alone properly representing multiple languages. What is lacking is a 
specification of opening and closing quotation marks for each level, 
and an indication of the rules for continuation reminders. How do you 
differentiate between NASB rules and NIV rules? Currently, you don't. 
How can that be acceptable? To me, it is not. If you made the use of 
the <q> tag totally optional, and specified that punctuation 
characters and rules must be specified in some way (maybe outside of 
OSIS, just as typographic information for rendering OSIS texts in 
HTML, PDF, or other formats would be), then it could be if you give me 
a good way to mark Jesus' words.

> Do we have two systems, one for "existing" texts and another 
>for "other" texts?

This is not as bad as it sounds. If the proper quotation marks are in 
the text of the Scriptures already, they are rendered just like any 
other punctuation. This is not a problem for the software. If the 
provided text uses <p> or <speech> markup AND informs the software in 
some way how to render the correct punctuation from this element (i. 
e. which characters to use for quotation marks at various levels and 
how to handle continuation reminders and at what points), then the 
software simply generates them on the fly as the start and end 
milestones are encountered and as paragraph and verse beginnings are 
encountered. If I were writing such software, it would default to NIV 
style quotation marks, but if you wanted NASB style, Italian, or 
Spanish styles, those would have to be specified some way or the 
punctuation marks would be incorrectly placed. You could even mix 
quotation marks and <p> markers in different parts of the book, as 
long as they never nested within one another, with no problem. The 
only restriction should be that you should never use both <p> and 
quotation marks in the same place, or the software would likely render 
twice as many quotation marks as are needed.

> Recall that users, just like the rest of us, will 
>always pick the easier route when available, which will mean that 
>instead of a uniform system of <q> elements, we will be where we are 
>now, that is use of markers that we may or may not interpret properly 
>when we get a text from France, Spain, etc.

More to the point, when you get texts from languages that don't mark 
quotations exactly like standard English marked with <q> instead of 
proper quotation marks, you will probably render them incorrectly. You 
risk using the wrong markers and you risk violating quotation 
continuation mark rules at paragraph boundaries. Even within English, 
you risk disregarding style decisions of the translators. In addition, 
if you attempt to make a standard that is widely perceived as being 
either inadequate or too expensive to use, then it will simply be 
ignored and die a quiet death or worse yet, be used in spite of 
superior alternatives, just because it was created sooner.

>This was debated at length and you are correct, it is a pain, but one 
>that we can get past with some software help and we won't have the 
>problems we do now.

To get past this pain, you must first establish a desire to get past 
it. Help me out, here. Why would I want to? I still don't see any 
benefit to doing so, except when drafting new texts that use the same 
punctuation rules of "standard" English. You also need to allow the 
creative freedom to use a typographic means such as block indentation 
instead of quotation marks (commonly used to quote an entire letter). 
I see no benefit whatsoever to requiring the use of <p> to the 
exclusion of normal typographic quotation marks for the language in 
use. Pain with a purpose might be acceptable. Pain without a purpose 
makes other alternatives look more attractive. OSIS is not assured of 
acceptance as a standard at this point any more than XSEM was.

>> This pain is exacerbated if you want 
>> to encode the words of Jesus Christ so that they may be optionally 
>> rendered differently (i. e. red ink). If you use proper quotation 
>> marks instead of <q> or <speech>, then you have no good way in OSIS 
>> of 
>> marking Jesus' words. I'm thinking that <hi type="x-JesusSaid"> (or 
>> any of a multitude of other nonstandard attributes) might work, if 
>> no 
>> better option is presented. You might rightly argue that there was 
>> no 
>> typological or color difference made in the rendition of Jesus' 
>> words 
>> in the original handwritten manuscripts, and you would be right. 
>> However, if the goal is to be able to reproduce the most important 
>> elements of existing published Bible texts, as well as new ones, 
>> then 
>> red letter edition marking is required to honor this admittedly 
>> later 
>> tradition. Use of the <hi> marker for Jesus' words seems to violate 
>> your intentions, but use of <q> INSTEAD of quotation marks for 
>> Jesus' 
>> words is not acceptable to me. Do you have another alternative?
>> 
>
>Actually, the <q> element has an "who" attribute which is what I 
>anticipated people using for things such as marking the words of 
>Jesus. 
>It takes a string value, in other words it does not require the "x-" 
>prefix.

I know that, but I don't want to use <q> for this purpose. Period. It 
is semantically for a different purpose. Consider a red-letter edition 
of the ASV or KJV, for example. If you add quotation marks around 
Jesus' words, you are altering the text, making it not a faithful copy 
of either of those translations. They don't use quotation marks 
anywhere in the Scriptures. (Added study notes might.)

>Note that this allows you to mark not the presentation of the words, 
>which could vary, what if I am color-blind and want the words of 
>Jesus 
>in bold instead of red?, but the reason why you want the words to be 
>rendered differently, which could vary from user to user. Same 
>purpose 
>as having the red letters, but making sure you can honor the purpose 
>and 
>not just the most common way of rendering it. Another use would be 
>for 
>the visually impaired, for who the red letter or bold would have no 
>meaning. With the common quote marker, I can't distinguish between a 
>quote of one of the pharisees or Jesus, which I think would be 
>important 
>to the visually impaired reader.

I accept all of that. Indeed, I do the same thing with the World 
English Bible and the HNV. The most recent printed editions of those 
use bold for Jesus' words-- not really for the color-blind, but for 
economy of printing while still making a typographic distinction. 
Other editions make no distinction. Personally, I don't care if Jesus' 
words are rendered differently or not, but I do care that the markup 
correctly indicates where they are to give publishers and web 
designers a choice. My point is that you don't give me a good way to 
mark Jesus' words unless I make use of the <q> tag, do some 
nonstandard markup like hijacking <hi> for that purpose, or create a 
competing standard.

>Appreciate your interest in OSIS and your hard work to bring the 
>Bible 
>to more people.

Thank you. I appreciate your efforts to create a Scripture markup 
standard that is open, XML-based, widely supported, and functional for 
many purposes that are near and dear to my heart. I think that you are 
almost there.

A long time ago (before I knew of the existence of SIL/UBS SFM, and 
before XML was made a standard), and on a continent far away from the 
island I'm writing from, I invented a little-known Bible markup 
standard called GBF <http://eBible.org/bible/gbf.htm>. It is based on 
a very minimalist philosophy of using just the minimum necessary 
markup to fully specify just the canonical text of the Holy Bible, 
plus very little more to cover things I thought I might need some day. 
It works. I can automatically convert from that format to HTML, PDF 
ready for typesetting, various formats used by Bible study programs, 
and more. It is a bit too minimalist to use for everything that SIL, 
EBT, UBS, and various other organizations want from a Scripture markup 
standard, but I find that it is still a good measure of a markup 
standard. Because it is so minimalist, but also contains all of the 
elements needed to produce the main text of a Bible, I should be able 
to losslessly convert from GBF to any good Bible markup standard with 
a simple computer program, give or take a little meta data. I could 
convert GBF to OSIS, right now, by ignoring the existence of <q> and 
using <hi> to mark Jesus' words. Alternatively, I could create some 
convoluted special code to distinguish between the final apostrophe 
used to mark plural possessives and the exact same character used to 
indicate the end of a second level quotation, replace all opening 
quotes with q benchmarks, except for paragraph start opening quote 
reminders (which would be discarded), replace all close quotes with q 
end benchmarks, and then, to top it all off, code the inverse 
operation and generalize the whole mess to accommodate other 
languages. It is easier to lobby for a change to OSIS or update GBF to 
an XML format, I think. In the mean time, you would do well to convert 
some more texts yourselves and see some of these issues in a more 
practical light.

Anyway, thank you for at least considering my requests and reading 
this far.


>Hope you are having a great day!

I am. Likewise!

Michael
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (MingW32)
Comment: http://eBible.org/mpj/gpg.htm

iD8DBQE//PetRI/gxxfXR7sRAij/AJ99RhGv4YrmDmyqKTAh8j95PxelhgCgq/xH
QUmzIC4bUOLa/x91veJ1yqs=
=hDvW
-----END PGP SIGNATURE-----