[osis-editors] suggested corrections
Michael Paul Johnson
osis-editors@bibletechnologieswg.org
Thu, 08 Jan 2004 16:26:06 +1000
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
One suggested addition for section 29.1.1 of the OSIS 2.0.1 User's
Manual:
HNV Hebrew Names Version of the World English Bible, also known as the
World English Bible: Messianic Edition (WEBME)
(See http://eBible.org/hnv/ for the home of this translation.)
One comment on http://www.bibletechnologies.net/osisCore.2.0.xsd: I
noticed that unlike the XSEM, which tries to dissect copyright and
other rights information into little chunks like copyrightDate and
holder, you just have a <rights> element that can hold a nice plain
language description of rights (copyrights on various parts of the
book, different dates and ranges, trademarks, permissions, public
domain declarations, etc.). I think I like your approach better,
although this mismatch is one of many things that makes lossless
bidirectional transfers to and from other formats less than
straight-forward. I suppose multiple rights elements with x-types
corresponding to these little chunks could be used on texts converted
from XSEM or other formats.
At 21:09 07-01-04, Patrick Durusau wrote:
>> Design issue:
>>
>> I disagree with forcing the use of <q> or <speech> tags in place of
>> quotation marks in Bible texts. This makes conversion of existing
>> texts which have quotation marks in place more difficult. It also
>> puts
>> more of a burden on OSIS software in dealing with quotation marks
>> for
>> every language, and even differences in style within one language.
>> For
>> example, to use the <q> tags to properly render NASB text vs. NIV
>> text, you would have to encode different rules in the software, but
>> if
>> you just use quotation marks, you can have the same software
>> properly
>> render both. Likewise, rules for continuing quotations through
>> paragraph boundaries tend to vary from language to language, as do
>> the
>> characters used to mark quotations. Using <q> tags is a fine option
>> when you want to encode rules, and for new Bible translation
>> drafts,
>> this is a good option to have. It has the potential of possibly
>> reducing errors in quotation mark placement, if properly used. (I
>> have
>> found many such quotation mark errors in one published Bible in
>> PNG).
>> For existing texts, it is a pain.
>
>Elements like <q> have a long history in markup in general, TEI in
>particular (both Steve DeRose and I hail from that community), and
>even
>in the debates over the OSIS schema. Let me give you the quick take
>on
>why we came down on this issue as we did.
>
>The crux of the matter is contained in your statement: "For existing
>texts, it is a pain." Quite so, but it also illustrates why going
>with
>elements and not inline text markers was the right choice.
>
>Most "existing texts" were written using what used to be called the
>ISO
>646 subset, that is ASCII characters that were considered "safe" for
>transmission over the Internet. Works great, for some texts. The
>problem
>is that it works great only for "some" texts.
>
>If we did not compell the use of the <q> and similar elements, how do
>we
>distinguish between texts where similar quotes work and those where
>it
>doesn't?
Why would you want to? How would that be a benefit? Does anybody
actually use the ability to programmatically recognize quotations for
anything but generation of the correct punctuation marks and sometimes
for highlighting Jesus' direct quotations?
If the purpose is to generate the correct quotation marks, then it is
pointless to use <q> if the correct quotation marks are already in the
text. If there is an alternate way to indicate the only two kinds of
quotations that people tend to render differently within actual
canonical text (Jesus' words and OT quotes in the NT), then <q> is
totally unnecessary. However, I support its use for people who want to
use those marks to generate punctuation where none exists already.
Would it not rather be superior to use the correct punctuation for
each language? Indeed, you could use a <sentence> element instead of
marking sentences with initial capitals and periods, but that doesn't
work in all languages, and it is an unnecessary complication with no
benefit. I don't see the <q> used to generate quotation marks as being
any different. Your implementation of <q> is insufficient even to
properly represent stylistic variations even of English texts, let
alone properly representing multiple languages. What is lacking is a
specification of opening and closing quotation marks for each level,
and an indication of the rules for continuation reminders. How do you
differentiate between NASB rules and NIV rules? Currently, you don't.
How can that be acceptable? To me, it is not. If you made the use of
the <q> tag totally optional, and specified that punctuation
characters and rules must be specified in some way (maybe outside of
OSIS, just as typographic information for rendering OSIS texts in
HTML, PDF, or other formats would be), then it could be if you give me
a good way to mark Jesus' words.
> Do we have two systems, one for "existing" texts and another
>for "other" texts?
This is not as bad as it sounds. If the proper quotation marks are in
the text of the Scriptures already, they are rendered just like any
other punctuation. This is not a problem for the software. If the
provided text uses <p> or <speech> markup AND informs the software in
some way how to render the correct punctuation from this element (i.
e. which characters to use for quotation marks at various levels and
how to handle continuation reminders and at what points), then the
software simply generates them on the fly as the start and end
milestones are encountered and as paragraph and verse beginnings are
encountered. If I were writing such software, it would default to NIV
style quotation marks, but if you wanted NASB style, Italian, or
Spanish styles, those would have to be specified some way or the
punctuation marks would be incorrectly placed. You could even mix
quotation marks and <p> markers in different parts of the book, as
long as they never nested within one another, with no problem. The
only restriction should be that you should never use both <p> and
quotation marks in the same place, or the software would likely render
twice as many quotation marks as are needed.
> Recall that users, just like the rest of us, will
>always pick the easier route when available, which will mean that
>instead of a uniform system of <q> elements, we will be where we are
>now, that is use of markers that we may or may not interpret properly
>when we get a text from France, Spain, etc.
More to the point, when you get texts from languages that don't mark
quotations exactly like standard English marked with <q> instead of
proper quotation marks, you will probably render them incorrectly. You
risk using the wrong markers and you risk violating quotation
continuation mark rules at paragraph boundaries. Even within English,
you risk disregarding style decisions of the translators. In addition,
if you attempt to make a standard that is widely perceived as being
either inadequate or too expensive to use, then it will simply be
ignored and die a quiet death or worse yet, be used in spite of
superior alternatives, just because it was created sooner.
>This was debated at length and you are correct, it is a pain, but one
>that we can get past with some software help and we won't have the
>problems we do now.
To get past this pain, you must first establish a desire to get past
it. Help me out, here. Why would I want to? I still don't see any
benefit to doing so, except when drafting new texts that use the same
punctuation rules of "standard" English. You also need to allow the
creative freedom to use a typographic means such as block indentation
instead of quotation marks (commonly used to quote an entire letter).
I see no benefit whatsoever to requiring the use of <p> to the
exclusion of normal typographic quotation marks for the language in
use. Pain with a purpose might be acceptable. Pain without a purpose
makes other alternatives look more attractive. OSIS is not assured of
acceptance as a standard at this point any more than XSEM was.
>> This pain is exacerbated if you want
>> to encode the words of Jesus Christ so that they may be optionally
>> rendered differently (i. e. red ink). If you use proper quotation
>> marks instead of <q> or <speech>, then you have no good way in OSIS
>> of
>> marking Jesus' words. I'm thinking that <hi type="x-JesusSaid"> (or
>> any of a multitude of other nonstandard attributes) might work, if
>> no
>> better option is presented. You might rightly argue that there was
>> no
>> typological or color difference made in the rendition of Jesus'
>> words
>> in the original handwritten manuscripts, and you would be right.
>> However, if the goal is to be able to reproduce the most important
>> elements of existing published Bible texts, as well as new ones,
>> then
>> red letter edition marking is required to honor this admittedly
>> later
>> tradition. Use of the <hi> marker for Jesus' words seems to violate
>> your intentions, but use of <q> INSTEAD of quotation marks for
>> Jesus'
>> words is not acceptable to me. Do you have another alternative?
>>
>
>Actually, the <q> element has an "who" attribute which is what I
>anticipated people using for things such as marking the words of
>Jesus.
>It takes a string value, in other words it does not require the "x-"
>prefix.
I know that, but I don't want to use <q> for this purpose. Period. It
is semantically for a different purpose. Consider a red-letter edition
of the ASV or KJV, for example. If you add quotation marks around
Jesus' words, you are altering the text, making it not a faithful copy
of either of those translations. They don't use quotation marks
anywhere in the Scriptures. (Added study notes might.)
>Note that this allows you to mark not the presentation of the words,
>which could vary, what if I am color-blind and want the words of
>Jesus
>in bold instead of red?, but the reason why you want the words to be
>rendered differently, which could vary from user to user. Same
>purpose
>as having the red letters, but making sure you can honor the purpose
>and
>not just the most common way of rendering it. Another use would be
>for
>the visually impaired, for who the red letter or bold would have no
>meaning. With the common quote marker, I can't distinguish between a
>quote of one of the pharisees or Jesus, which I think would be
>important
>to the visually impaired reader.
I accept all of that. Indeed, I do the same thing with the World
English Bible and the HNV. The most recent printed editions of those
use bold for Jesus' words-- not really for the color-blind, but for
economy of printing while still making a typographic distinction.
Other editions make no distinction. Personally, I don't care if Jesus'
words are rendered differently or not, but I do care that the markup
correctly indicates where they are to give publishers and web
designers a choice. My point is that you don't give me a good way to
mark Jesus' words unless I make use of the <q> tag, do some
nonstandard markup like hijacking <hi> for that purpose, or create a
competing standard.
>Appreciate your interest in OSIS and your hard work to bring the
>Bible
>to more people.
Thank you. I appreciate your efforts to create a Scripture markup
standard that is open, XML-based, widely supported, and functional for
many purposes that are near and dear to my heart. I think that you are
almost there.
A long time ago (before I knew of the existence of SIL/UBS SFM, and
before XML was made a standard), and on a continent far away from the
island I'm writing from, I invented a little-known Bible markup
standard called GBF <http://eBible.org/bible/gbf.htm>. It is based on
a very minimalist philosophy of using just the minimum necessary
markup to fully specify just the canonical text of the Holy Bible,
plus very little more to cover things I thought I might need some day.
It works. I can automatically convert from that format to HTML, PDF
ready for typesetting, various formats used by Bible study programs,
and more. It is a bit too minimalist to use for everything that SIL,
EBT, UBS, and various other organizations want from a Scripture markup
standard, but I find that it is still a good measure of a markup
standard. Because it is so minimalist, but also contains all of the
elements needed to produce the main text of a Bible, I should be able
to losslessly convert from GBF to any good Bible markup standard with
a simple computer program, give or take a little meta data. I could
convert GBF to OSIS, right now, by ignoring the existence of <q> and
using <hi> to mark Jesus' words. Alternatively, I could create some
convoluted special code to distinguish between the final apostrophe
used to mark plural possessives and the exact same character used to
indicate the end of a second level quotation, replace all opening
quotes with q benchmarks, except for paragraph start opening quote
reminders (which would be discarded), replace all close quotes with q
end benchmarks, and then, to top it all off, code the inverse
operation and generalize the whole mess to accommodate other
languages. It is easier to lobby for a change to OSIS or update GBF to
an XML format, I think. In the mean time, you would do well to convert
some more texts yourselves and see some of these issues in a more
practical light.
Anyway, thank you for at least considering my requests and reading
this far.
>Hope you are having a great day!
I am. Likewise!
Michael
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (MingW32)
Comment: http://eBible.org/mpj/gpg.htm
iD8DBQE//PetRI/gxxfXR7sRAij/AJ99RhGv4YrmDmyqKTAh8j95PxelhgCgq/xH
QUmzIC4bUOLa/x91veJ1yqs=
=hDvW
-----END PGP SIGNATURE-----