[osis-editors] Re: [IPS] Re: OSIS: history, quotation marks,
any other issues?
Kahunapule Michael P. Johnson
Michael_Paul_Johnson at sil.org
Fri Mar 18 20:00:58 MST 2005
Doug Higby wrote:
>Dear Michael and Todd,
>
>Forgive me for entering into a rather complex discussion with a rather simple concern about quotations.
>
>
Actually, your simple concern is very important, and one that I have
considered. The other ones are:
* versatile reformatting of punctuation to display both "verse list" and
"paragraph oriented" Scriptures from the same source
* automatic adjustment of quotation marks when pulling a quote
containing quotes from a Scripture portion
* initial generation of quotation punctuation in a specific Scripture
translation, then optionally "freezing" the results.
Balance those with the following concerns:
* keeping OSIS readers simple by not requiring intimate knowledge of
every language and style on Earth.
* honoring the punctuation decisions of existing translations and
publishers, and respecting their copyrights
* facilitating automated, lossless conversion between significant Bible
interchange formats.
There is a way to do both. This is not an either-or proposition. The
markup used should be flexible enough to deal with the applications it
is used for, and the languages and styles it is used to encode.
>Michael, you insist that quotations are punctuation that is an integral part of the scripture text and should not be coded as anything but actual text.
>
That is only partially correct. Punctuation, including quotation
punctuation, should always be allowed to be encoded as actual text.
There are times when it makes sense to encode quotations with markup.
> And the OSIS standard, as I surmise, is coding them as markup rather than text.
>
>
That is what I understand.
I would like to allow you or any other Bible translator to use markup to
generate quotation punctuation as you see fit, and to be able to do it
differently for different situations where it is appropriate for your
language.
I strongly object to requiring quotation punctuation to always for all
languages and all translations and for all time to be coded as markup
and not as actual text. I insist on having the OPTION of coding
punctuation as actual text, and not as markup. Unmodified OSIS is
unsuitable for my applications (including most SIL applications) because
of this restriction.
I would prefer to be able to use markup to indicate quotation
punctuation for your case and for the others that I mentioned, when the
translator is wanting to do that.
When the Scripture file moves from the translator to the publisher, then
I would like for the publisher to be able to surmise from the markup
itself how to correctly generate quotation punctuation without reference
to another document; if not in your source document, at least in a
processed generated document used for exchange purposes.
The OSIS standard, at least as of the last time that I looked, does not
allow the use of quotation punctuation as part of the text AND marking
quotations that may be desirable to render with a different character
style. It has other minor defects that I could live with, but this one
is fatal. That means that I am opposed to using unmodified OSIS until
this defect is corrected.
A good Bible XML markup schema would:
* Allow 100% exact specification of where every single punctuation mark
goes, including quotation punctuation if desired by the translators,
without reference to any external style sheets.
* Allow markup for quotation start and end quotations, and even for
quotation reminders (which might be automatically generated), for the
purpose of generating quotation punctuation in different circumstances.
The automatic generation need not be included in the markup standard if
the results of that process can be unambiguously encoded.
* Allow for 100% automatic lossless conversion to and from USFM.
Unmodified OSIS does only one out of those three, unless this has
changed very recently. It isn't a matter of implementation. It is a
matter of what is possible to implement given the USFM and OSIS standards.
The best solution I could imagine is embodied in USFX
(http://ebt.cx/usfx/) already. In that case, quotation start, quotation
end, and quotation reminder markup exists, but (1) is not mandatory to
use, (2) those element act as containers for the actual quotation
punctuation to be used at that point, and (3) it is easy to run a
separate process to generate or regenerate quotation marks in any style
you please, embedding the results in your document in such a way that
not every USFX reader has to understand the punctuation rules of your
language-- just the one you use to generate or regenerate your punctuation.
(Note: USFX isn't intended to do everything that OSIS can do, but it can
do everything that USFM can do plus a few things OSIS can do but USFM
can't do. I didn't invent USFX to compete with OSIS, but to solve a
problem that even slightly modified OSIS couldn't solve. Indeed,
authoring in USFX then converting to OSIS would be a good way to produce
OSIS files for a lot of people, as it can handle pretty much everything
the ordinary working linguist would use, and it is much simpler.)
Here is the essential difference between the religion of OSIS and my
rather pragmatic view of the many uses of a Scripture interchange format
file: I don't trust programmers, publishers, and people who don't even
speak the language of a Scripture translation to always generate the
correct quotation punctuation from markup. I don't believe that a few
simple rules are sufficient for people other than the translators to get
it right. There are stylistic decisions and exceptions to rules that are
intentionally made. For example, even in the extremely simple case of
the World English Bible, there are at least two intentional exceptions
to the rules in the quotation punctuation checking program that I wrote.
I do, however, trust the translators to provide a set of rules to
generate quotation punctuation from markup, or even multiple sets of
rules as options. I want the translators to be able to intentionally
specify exceptions to the rules, if necessary. I want to have the
translators' rules and exceptions "stick" when they pass the Scripture
file on to others. I don't want those others to have to know or
understand all the rules and exceptions, but to be able to simply read
the markup and display the results, and have them be right according to
the translator.
OSIS could be easily modified to do that. I would not be so vociferous
about this problem, had they done so many moons ago when I first brought
this problem to light.
In short, I'm on your side, Doug, but I am still opposed to using OSIS
as it is currently specified.
Does that make sense?
>Here is a case for you to consider:
>
>I am publishing the New Testament in Fulfulde over the next month in Dallas, and we have adopted a quotation system that follows the French system for direct speech. We mark direct speech with an m-dash at the beginning of the line as in:
>
>Peter said:
>--You are the Messiah, the Son of God.
>
>I am not confident that we will keep this form of punctuation, and some day, when we print the entire Bible, we may want to switch to using the angle brackets that both open and close the punctuation. Both are acceptable forms of punctuation.
>
>I would much rather have my data stored in a format where the markup was aware of where the quote started and where it stopped. The system I am currently using is opened with the m-dash, but can be closed by any number of format markers. Some format markers allow the quotation to continue with a new paragraph with no additional markers other than that the new paragraph is indented to the same level as the one above. I know there are other complex quotation system for another reason too. If you go to the quotation checking utility built into Paratext, you will find that they have to determine the following information to see if a quote is closed properly or not:
>
>Data fields:
>Quotes:
>Quotes in Quotes:
>Quotes in Quotes in Quotes:
>Continue quotes (are quotes continued at each new paragraph?)
>Continue quotes in quotes
>Continue quotes after these markers:
>
>It isn't worth explaining all the purpose of these fields except that they are to help Paratext check to see if quotes have been properly terminated and marked.
>
>As complex as this model is, it can't handle my quotation system, and I have to check quotes by hand to a large degree.
>
>
This is a good argument against using markup to automatically generate
quotation marks, isn't it? I understand that you want to mark the
quotation start and stop points, but do you want some programmer in
India who never met you to write the rules for placing your quotation marks?
>To me, the markup language would be the ideal place to signal when a quote starts and when it stops. If the markup language permitted this, I would be able to switch from one quotation system to another, based on the media.
>
>
Yes, IF you were able to write the rules easily enough and embed them
into every OSIS reader on the planet.
>Example: The Parole de Vie, French translation used the m-dash system for their New Testament, but when they came out with the Old Testament, they switched to an open/close system using angle brackets. They probably did this because the text had to be smaller point size for the whole Bible, and also, they needed to conserve more space in the whole Bible, since the m-dash quote system they used, created a lot of white space that couldn't be sacrificed in the whole Bible printed edition.
>
>I can't argue that the quotation system is an integral part of the text. I would instead argue that the quotation marking system is part of the markup language. The benefits of such are:
>
>1. Software can easily check the integrety of the quotation system, which is overly complex to accomplish with existing USFM.
>
>
You could check the open/close quote matching and nesting, but you
couldn't check the punctuation any easier than you can, now.
>2. The quotation marks can be adapted to what works best with the media and format: Web page, PDA screen, Large print edition, New Testament only, Whole Bible, Passage excerpts.
>
>
Yes, IF you have the rules for your language encoded into a standardized
style sheet readable by every OSIS reader in the galaxy, OR you limit
the processing to a few OSIS processors that "understand" your language
and embed the results into as many OSIS texts as are appropriate.
The unspoken assumption in OSIS is that someone else will deal with the
quotation punctuation generation problem using a style sheet that should
be easy to generate and use. Yeah, right. A double positive can make a
negative in English.
>
>Doug mailto:Doug_Higby at sil.org
>
>Monday, February 21, 2005, 11:23:26 AM, you wrote:
>
>
>
>Todd Tillinghast> Kahunapule Michael Paul Johnson,
>
>Todd Tillinghast> I believe I am the only person
>Todd Tillinghast> from the OSIS technical team who is
>Todd Tillinghast> subscribed to this list. I
>Todd Tillinghast> apologize for not responding earlier, my son
>Todd Tillinghast> just got out of the hospital yesterday.
>
>Todd Tillinghast> See below.
>
>Todd Tillinghast> Todd
>
>
>
>>>Official acceptance of OSIS as the SIL standard for Scripture markup in
>>>XML was probably premature. Then again, SIL promoted XSEM for the same
>>>purpose for a while, so maybe this standard is transient, too.
>>>
>>>
>
>Todd Tillinghast> I believe the flow of events
>Todd Tillinghast> occurred slightly differently.
>Todd Tillinghast> XSEM was developed as a first
>Todd Tillinghast> step to demonstrate the opportunities and
>Todd Tillinghast> possibilities. XSEM might have
>Todd Tillinghast> grown into a production standard but
>Todd Tillinghast> instead SIL encouraged the
>Todd Tillinghast> formation of an "industry wide" standard
>Todd Tillinghast> rather than an SIL only standard.
>
>Todd Tillinghast> SIL has participated in OSIS from
>Todd Tillinghast> the beginning and has maintained the
>Todd Tillinghast> position that it will use OSIS
>Todd Tillinghast> rather than XSEM once .
>
>Todd Tillinghast> As a result I believe that SIL is
>Todd Tillinghast> a participant in OSIS more as a parent
>Todd Tillinghast> giving birth to a child rather
>Todd Tillinghast> than by adoption. In either case, I
>Todd Tillinghast> think the issue is more one of
>Todd Tillinghast> how can we train the child we have rather
>Todd Tillinghast> than looking to adopt another one.
>
>
>
>>>I very
>>>much like having a well-thought-out XML standard for Scripture
>>>interchange that meets the needs of Bible translators, Bible publishers,
>>>and Bible study software writers and publishers. OSIS is one of the top
>>>contenders for that role in technical terms, in spite of its many
>>>shortcomings. Most of those shortcomings are basically cosmetic, and at
>>>least one is a serious problem. If OSIS is modified such that
>>>standard-compliant use of OSIS always guaranteed that all of the text
>>>(including punctuation) of a Bible translation was preserved, and if
>>>truly lossless bidirectional conversion between USFM and OSIS were
>>>possible, then I would support OSIS as a standard. It would take very
>>>little modification of OSIS to make it so.
>>>
>>>
>
>Todd Tillinghast> Do you think you could enumerate
>Todd Tillinghast> the issues you have with OSIS so that
>Todd Tillinghast> they can be addressed?
>
>Todd Tillinghast> I believe the serious issue you
>Todd Tillinghast> are referring to is the issue of marking
>Todd Tillinghast> quotes as punctuation vs with
>Todd Tillinghast> markup. The mapping between USFM and OSIS
>Todd Tillinghast> being finalized at present, does
>Todd Tillinghast> not map simple quotations from USFM to
>Todd Tillinghast> OSIS because there are no format
>Todd Tillinghast> markers for simple quotations within
>Todd Tillinghast> USFM.
>
>Todd Tillinghast> There are format markers for
>Todd Tillinghast> alternate readings, quotes within notes,
>Todd Tillinghast> old testament quotes, the words
>Todd Tillinghast> of Jesus, various embedded texts, etc...
>Todd Tillinghast> that are mapped between USFM and OSIS.
>
>Todd Tillinghast> The result is that when
>Todd Tillinghast> converting between USFM and OSIS all of the
>Todd Tillinghast> "marks" used to indicate that a
>Todd Tillinghast> quote is starting, continuing, and/or
>Todd Tillinghast> ending are encoded in OSIS the
>Todd Tillinghast> same way they are in USFM -- as quote
>Todd Tillinghast> marks in the text. As a result
>Todd Tillinghast> when converting from OSIS to USFM the
>Todd Tillinghast> quotation marks flow back the same way they came in.
>
>Todd Tillinghast> To respond to your earlier post
>Todd Tillinghast> regarding regarding \wj and \qt:
>Todd Tillinghast> \qt maps to <seg type="otPassage">
>Todd Tillinghast> and
>Todd Tillinghast> \wj maps to <q who="Jesus">
>
>Todd Tillinghast> In further response to the issue
>Todd Tillinghast> you brought up regarding retaining the
>Todd Tillinghast> mark used to punctuate quotes, I
>Todd Tillinghast> brought up the possibility of adding a
>Todd Tillinghast> "mark" attribute to <q> (and
>Todd Tillinghast> possibly <milestone> and <speech>).
>
>Todd Tillinghast> The proposed solution would look something like:
>Todd Tillinghast> <q sID="abc" mark="[whatever the
>Todd Tillinghast> starting mark is]"/>the text of the
>Todd Tillinghast> quote<q eID="abc" mark="[whatever
>Todd Tillinghast> the ending mark is]"/>
>
>Todd Tillinghast> The current proposal would
>Todd Tillinghast> require the encoding of quotes using
>Todd Tillinghast> milestones IF you need to specify
>Todd Tillinghast> the mark for the end of the quote.
>Todd Tillinghast> The reasoning being that most
>Todd Tillinghast> quotes in scripture overlap other elements
>Todd Tillinghast> are are milestoned anyway and
>Todd Tillinghast> only one attribute is added.
>
>Todd Tillinghast> A complicating issue is how to
>Todd Tillinghast> differentiate between the explicit
>Todd Tillinghast> encoding of no quotation mark and
>Todd Tillinghast> not encoding a "mark" attribute at
>Todd Tillinghast> all. The likely solution would
>Todd Tillinghast> be to make a default value for "mark"
>Todd Tillinghast> that is not any empty string.
>Todd Tillinghast> Something like "none" if no "mark"
>Todd Tillinghast> attribute is encoded.
>
>
>
>>>A practical standard will attract a following based on its merits.
>>>
>>>
>
>Todd Tillinghast> Agreed.
>
>Todd Tillinghast> #############################################################
>Todd Tillinghast> This message is sent to you
>Todd Tillinghast> because you are subscribed to
>Todd Tillinghast> the mailing list <iPubSupport at lists.sil.org>.
>
>Todd Tillinghast> To unsubscribe, E-mail to:
>Todd Tillinghast> <iPubSupport-off at lists.sil.org>
>
>Todd Tillinghast> Send administrative inquiries to <jim_park at sil.org>
>
><none>
>
>
--
Kahunapule M. P. Johnson <Michael_Paul_Johnson at sil.org>
http://eBible.org/mpj/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.bibletechnologieswg.org/pipermail/osis-editors/attachments/20050319/43dace79/attachment-0001.html
More information about the osis-editors
mailing list