[sword-devel] Re: [osis-editors] OSIS 2.0.1 modules updated

Patrick Durusau sword-devel@crosswire.org
Wed, 17 Mar 2004 09:29:42 -0500


Michael,

Thanks for the quick response!

Preliminary reply below, more to follow.

Michael Paul Johnson wrote:
> At 06:46 17-03-04, Patrick Durusau wrote:
>  >Michael,
>  >
<snip>

>  >
>  >Is there some case in particular that is a problem? Realizing that it is
>  >going from one language to another where handling of quotations gets
>  >real messy. Best we can do is mark the quotes accurately and without
>  >ambiguity.
> 
> Let me try to explain again what I meant. In short, I don't believe that 
> OSIS provides enough information to the processors of OSIS texts to 
> reliably regenerate the correct punctuation with regard to quotation 
> marks. That is because it does not.
> 

Part of our difference (but not all, see below) maybe where we place 
responsibility for the rendering of quotes.

 From what you say in the next paragraph, I take it that you think an 
encoding should "compell" the same rendering of quotes as the original 
text that is being encoded. Either you follow the encoding and get a 
correct result or you deviate from it and get an incorrect result. Is 
that a fair summary?

The reason I ask is that from my text encoding background, quotes 
(separate from how they are rendered) must be encoded in a way that 
allows the user to always distinguish a quote from other quotes as well 
as other material in the text. From my perspective, I never reach the 
question of rendition until someone wants me to actually render the text.

The question then is: Based on this encoding, what presentations are or 
are not possible given this encoding. That is the question I will be 
attempting to answer below.


> It is not good enough to "allow" the use of different characters for 
> quotation marks. It is not good enough to "allow" the generation of 
> quotation continuation reminders at the beginnings of verses and/or 
> paragraphs. These rules MUST be specified OR the quotations MUST be 
> already in place, rendered correctly. Otherwise, you are inviting 
> programmers to change the punctuation in Bible texts. That may not 
> bother you, but it strikes me as being WRONG. Let the Bible translators 
> and publishers control the punctuation for each language and each 
> translation. Please.
> 
Disagree that OSIS is inviting programmers to change the texts, even the 
presentation of it.

When you say the "rules MUST be specified or the quotations MUST be 
already in place" are you saying the rules must be part of the encoding?

Where I am getting lost is in the difference you see (and I don't) 
between the encoding somehow representing the rules and my stylesheet 
supplying the rules. In either case, the presentation is the same. (At 
least on my assumption that our encoding is sufficient, which I address 
below.)

Are we disagreeing about where the rules for rendering should be placed? 
Normally rendering is separated as much as possible from content, but I 
don't know of any common markup system that does that entirely. Witness 
our <hi> element for instance.

One possibility, depending upon the degree of rendering information you 
wish to embed would be to use PIs (processing instructions) but I don't 
have time to cover that in detail at the moment.


> Not every language uses the same rules for quotation punctuation as 
> English. Not even every dialect of English, nor even every modern 
> English Bible translation uses the same rules. Even languages that use 
> mostly the same alphabet and punctuation as English may use different 
> quotation marks. If I were implementing an OSIS reader right now, and 
> desired to faithfully reproduce the punctuation of the original 
> translation based on what was in the OSIS text alone, I could not. Not 
> even for modern English.
> 
Think we need to separate out the issue of quotation marks in different 
languages from reproducing a particular edition.

My reasoning is that a particular edition is already in a specific 
language and has a known rendering of quotes. The common question is:

Given an OSIS encoded text, can I reproduce the rendering of the quotes 
in the language of the text?

That different languages follow different quote rendering traditions is 
really beside the point. We have a text in a particular language and 
that is the language in which it will be rendered.

> Take a good look at a printed NASB. Note that continuation opening 
> quotes are present at the beginning of every verse when a quote is open, 
> there. Take a good look at a printed NIV. Note that continuation opening 
> quotes are present at the beginning of every paragraph when needed, but 
> not at the beginning of every verse. Now consider the Spanish "La Biblia 
> de las Américas." Note that it doesn't usually use quotation marks, but 
> it does mark Jesus' words in red. It uses colons, capitalization, and 
> other hints to indicate quotations. The Spanish RVA uses quotation marks 
> for some quotations, but not for all of them. The Bargam (Madang 
> Province, PNG) New Testament does not use quotation marks like English, 
> but the Borong (Morobe Province, PNG) uses quotation marks in the same 
> manner as the English NIV.
> 
> ALL of those cases present problems for using the <q> element as the 
> current revision of OSIS defines it.
> 
OK, I have the conclusion but not the "why" that underlies it.

 From my perspective, if the quotes have been properly marked, we have 
the following rules in the stylesheet (modulo our possible difference on 
where to locate the rules):

NASB: If a quote is open, place a continuation quotation mark at the 
beginning of every following verse until the quote closes

NIV: If a quote is open, place a continuation quotation mark at the 
beginning of every following  paragraph (but not verses) until the quote 
closes

For editions that mark the words of Jesus in red, render <q who="jesus"> 
without quotation marks but in red.

True you would need a separate stylesheet for each edition, but I am 
assuming enough variation in rendering for that to be necessary in any 
event.

Note that the case we have not discussed is the rendering of multiple 
overlapping and sometimes nested quotes. That I concede is a problem and 
one that we have not entirely addressed. Whether that should be by 
encoding (strictly speaking), PIs or simply stylesheets is an open issue.



> This is NOT acceptable to me. I think I'm pretty reasonable, and I like 
> to use standards in a standard way, but if OSIS stays the same, I will 
> never use it exactly as it was specified. If one of your most active 
> proponents of your standard feels that way, maybe you should look at the 
> problem again?
> 
:-) Appreciate the support and we really do want a solution that works.


> For now, I will continue to recommend that everyone embed correct 
> punctuation directly in the text of OSIS documents and to use <q> in a 
> nonstandard manner to mark Jesus' words, when desired, like the 
> following quote.
> 
> <q sID="Matt.3.15.1" who="Jesus" type="x-doNotGeneratePunctuation" 
> />“Allow it now, for this is the fitting way for us to fulfill all 
> righteousness.”<q eID="Matt.3.15.1" />
> 
> If anyone ignores the type="x-doNotGeneratePunctuation", and generates 
> punctuation from the markers anyway, they will get double punctuation. 
> This is not good. I'm hoping you will change the standard to something 
> that we can actually use and feel good about.
> 
Don't understand the necessity for the type="x-doNotGeneratePunctuation" 
attribute?

How about:

<xsl:template match="q[@who="jesus"] >
  <p><font color="red">
	<xsl:value-of select="."/>
  </font></p>

Note that elsewhere I would have the rule for <q> in general, but by 
default, XSLT always applies the most specific rule for an element that 
it can find. In other words, without my doing anything special, I can 
have a rule that marks regular quotes with '"" to open and close, to 
mark quotes from Jesus with no '"' at all but render in red, etc. On the 
basis of any attribute or position in the document, or even ID of a 
particular element (although I would not recommend that last one).

Not saying you want to use an HTML <p> element but simply to illustrate 
the principle.

Note that this is using XSLT and you mileage may differ if you are using 
non-XML based tools for processing.

> The <q> marker as you have defined it has merit when generating 
> quotation punctuation in the first place in a new translation. After 
> that is done, it has no merit, at least for any application that I am 
> concerned with: translation, typesetting, and electronic distribution.

OK, but again you are telling me what you concluded but not why? Granted 
there are some difficult quote cases but as I pointed out above, I think 
with the exception of overlapping and nesting quotes, I think most of 
them are addressable by the encoding or stylesheets. Not trying to be 
pushy but I think we can work together towards a solution if we can 
illustrate the problem as I have tried to show a solution above. There 
maybe reasons why a particular solution does not appeal to you or work 
in a given context but that again is something we can address.

Note that truly random typographic markup, quotes or other markers that 
occur in a manner that cannot be described in terms of the structure of 
the text, cannot be encoded or rendered using a stylesheet aside from 
use of PIs or specific stylesheet instructions that address those elements.

It is a fundamental limitation of XML that it cannot, without use of one 
of the mechanisms I mentioned (PIs/specific element styles by ID), 
reproduce random typography. It may be very important and significant 
typography but structured markup is ill-suited to that purpose. Emphasis 
on the fact we can do it, question is how important is it?

Get the same issue with academics and XSL-FO. Question there is that a 
text may look "better" with hand inserted micro-spaces between letters 
for an ancient text. Well, do you want to pay someone $20/hour (or more, 
I'm guessing) to typeset 200 pages of text or do you want me to spend 60 
minutes setting up an XSL-FO stylesheet that allows you to render it 
over and over, even after every correction? Is it as good as hand 
typesetting? No, but then it is far cheaper and allows for revision up 
to the point we ship to the printer. Suppose you can guess which one I 
advocate. :-)

> The only reason I use it at all is that you provided no other way to 
> mark Jesus' words for a "red letter edition" of a Bible. Granted, the 
> words of Jesus were not so marked in the original manuscripts, and some 
> people argue that they should not be, but you simply won't get 
> widespread acceptance of a Scripture interchange standard unless you 
> support this traditional feature. People who read such texts can freely 
> choose to use red ink or not, as far as I'm concerned, but the markup 
> should be there and accurate for those who choose to use it.
> 
> I hope that now you understand why my conversion to OSIS software 
> inserts some disclaimers in the revision description element.
> 
>    <revisionDesc resp="Rainbow Missions, Inc. http://RainbowMissions.org">
>     <date>2004-03-14T12.25.09</date>
>     <p>
> This draft version of the World English Bible is substantially complete in
> the New Testament, Genesis, Exodus, Job, Psalms, Proverbs, Ecclesiastes,
> Song of Solomon, and the “minor” prophets. Editing continues on the 
> other
> books of the Old Testament. Apocrypha books in this file are still in rough
> draft form.
> </p>
>     <p>Converted ..\..\web.gbf in GBF to web.osis.xml in
> an XML format that attempts to comply with OSIS 2.0 using gbf2osis.exe.
> (Please see http://ebt.cx/translation/ for links to this software.)</p>
>     <p>GBF and OSIS metadata fields do not exactly correspond to each 
> other, so
> the conversion is not perfect in the metadata. However, the Scripture 
> portion
> should be correct.</p>
>     <p>No attempt was to convert quotation marks to structural markers 
> using q or
> speech elements, because this would require language and style-dependent
> processing. In English texts, the hard part is figuring out what ’ means.
> The other difficulty is that I am not yet convinced that the proper
> punctuation marks would be reconstituted by software that reads OSIS 
> files.</p>
>     <p>The output of gbf2osis marks Jesus' words in a non-standard way 
> using the q
> element AND quotation marks if they were marked with FR/Fr markers in 
> the GBF
> file. The OSIS 2.0 specification requires that quotation marks be 
> stripped out,
> and reinserted by software that reads the OSIS files when q elements are 
> used.
> To convert this to an OSIS 2.0 file, you must either remove all q elements,
> remove the quotation marks around Jesus' quotes, or convince the keepers 
> of the
> standard to change the standard.</p>
>     <p>OSIS does not currently support footnote start anchors. 
> Therefore, these
> start anchors have been represented with milestone elements, in case someone
> might like to use them, for example, to start an href element in a 
> conversion
> to HTML.</p>
>     <p>Traditional psalm book titles are rendered as text rather than 
> titles, because
> the title element does not support containing transChange elements, as 
> would be
> required to encode the KJV text using OSIS title elements.</p>
>     <p>The schema location headers were modified to use local copies 
> rather than the
> standard locations so that these files could be validated and used 
> without an
> Internet connection active at all times (very important for the developer's
> remote island location), but you may wish to change them back.</p>
>    </revisionDesc>
> 

I recall some recent discussion of footnote start anchors but don't have 
it at my finger tips. Can you say a few words about that?

Appreciate the disclaimer with information on what to change but I still 
don't see the language dependency as being a problem. That happened when 
the translation was made so I think we agree that the quotation style 
from that perspective is fixed.

In other words, you want to duplicate the quotation style of a 
particular language, which to my mind requires a different stylesheet 
(modulo the remaining difficult quote problems).

Apologies for the length of my response! I am working on the schema 
again today and may just be delaying my jump into the regexes. ;-)

Appreciate your support for OSIS and appreciate your help in moving it 
forward.

Hope you are having a great day!

Patrick


> 
> I hope this helps. :-)
> 
> Your fellow servant of Jesus Christ,
> Michael


-- 
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
Patrick.Durusau@sbl-site.org
Chair, V1 - Text Processing: Office and Publishing Systems Interface
Co-Editor, ISO 13250, Topic Maps -- Reference Model

Topic Maps: Human, not artificial, intelligence at work!