[sword-devel] Re: [osis-editors] OSIS 2.0.1 modules updated

Michael Paul Johnson sword-devel@crosswire.org
Thu, 18 Mar 2004 10:01:56 +1000


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At 00:29 18-03-04, Patrick Durusau wrote:
...
> From what you say in the next paragraph, I take it that you think an 
>encoding should "compell" the same rendering of quotes as the 
>original 
>text that is being encoded. Either you follow the encoding and get a 
>correct result or you deviate from it and get an incorrect result. Is 
>that a fair summary?

It would be more accurate to say that I insist that the encoding must 
compel the exact same rendering of quotation marks (and all other 
punctuation) as the original text being encoded, regardless of the 
language and style. Is that asking too much? Do I need to look to 
another encoding standard, instead of OSIS?

>The reason I ask is that from my text encoding background, quotes 
>(separate from how they are rendered) must be encoded in a way that 
>allows the user to always distinguish a quote from other quotes as 
>well 
>as other material in the text. From my perspective, I never reach the 
>question of rendition until someone wants me to actually render the 
>text.

- From my perspective, punctuation is part of the text. This includes 
quotation marks. If you claim to encode the text but don't do so in a 
way that guarantees that the text can be reconstructed, including 
punctuation, then you have what I call "lossy encoding." This kind of 
encoding is not acceptable for Bible texts, in my opinion. I 
understand your point of view. I just disagree with it.

>Are we disagreeing about where the rules for rendering should be 
>placed? 

We are disagreeing as to the nature of quotation mark insertion. You 
seem to be of the opinion that this is a rendering issue, much as 
selection of font parameters for the various kinds of encoded text 
elements would be, or how verses are marked (or not marked) in the 
rendered text. I strongly disagree with that perspective. Punctuation 
marks, including quotation marks, are part of the Bible text. If you 
want to include metadata in the XML markup about the quotations, 
perhaps to enable analysis of the text or even to regenerate a 
different version of the text punctuated differently, that is a 
separate issue.

>Normally rendering is separated as much as possible from content, but 
>I 
>don't know of any common markup system that does that entirely.

Fine, but I don't regard punctuation marks as a rendering issue. They 
are part of the text, as are periods, commas, apostrophes, colons, em 
dashes, etc. Each of those could be represented with markup, too, but 
I don't see any advantage to doing that, either.

>One possibility, depending upon the degree of rendering information 
>you 
>wish to embed would be to use PIs (processing instructions) but I 
>don't 
>have time to cover that in detail at the moment.

I don't want to treat quotation marks as a rendering issue. Therefore, 
encoding the rules into the OSIS text could never be more than a lame 
work-around, at best.

>That different languages follow different quote rendering traditions 
>is 
>really beside the point. We have a text in a particular language and 
>that is the language in which it will be rendered.

No, it is NOT beside the point. I deal with texts in a multitude of 
languages, and I want to keep the text and rendering style information 
separate, but I also want to specify with 100% accuracy exactly where 
every quotation mark of any kind goes.

>OK, I have the conclusion but not the "why" that underlies it.

You now have the "why." You may disagree with it, but you have it. 
This is a major issue, as far as I'm concerned. I refuse to use OSIS 
in the way you envision it, with quotation marks inserted only at 
rendering time. Period. You have given me no good reason to do 
otherwise, and the reasons you have given are ones with which I 
disagree.

>True you would need a separate stylesheet for each edition, but I am 
>assuming enough variation in rendering for that to be necessary in 
>any 
>event.

Putting the quotation mark rendering rules in the style sheets is not 
acceptable to me. Even if you wanted to edit the way a text was 
punctuated, for example to turn NASB-style quotations and paragraphs 
to NIV-style quotations and paragraphs, I think that should be a 
totally separate process.

>Note that the case we have not discussed is the rendering of multiple 
>overlapping and sometimes nested quotes. That I concede is a problem 
>and 
>one that we have not entirely addressed. Whether that should be by 
>encoding (strictly speaking), PIs or simply stylesheets is an open 
>issue.

That is an issue that is not even a concern when the quotation marks 
are treated as text.

>> This is NOT acceptable to me. I think I'm pretty reasonable, and I 
>> like 
>> to use standards in a standard way, but if OSIS stays the same, I 
>> will 
>> never use it exactly as it was specified. If one of your most 
>> active 
>> proponents of your standard feels that way, maybe you should look 
>> at the 
>> problem again?
>> 
>:-) Appreciate the support and we really do want a solution that 
>works.

OK, then make it so. :-)

>> For now, I will continue to recommend that everyone embed correct 
>> punctuation directly in the text of OSIS documents and to use <q> 
>> in a 
>> nonstandard manner to mark Jesus' words, when desired, like the 
>> following quote.
>> 
>> <q sID="Matt.3.15.1" who="Jesus" type="x-doNotGeneratePunctuation" 
>> />“Allow it now, for this is the fitting way for us to fulfill all 
>> righteousness.”<q eID="Matt.3.15.1" />
>> 
>> If anyone ignores the type="x-doNotGeneratePunctuation", and 
>> generates 
>> punctuation from the markers anyway, they will get double 
>> punctuation. 
>> This is not good. I'm hoping you will change the standard to 
>> something 
>> that we can actually use and feel good about.
>> 
>Don't understand the necessity for the 
>type="x-doNotGeneratePunctuation" 
>attribute?

That is because the punctuation is already rendered correctly in the 
text, and the q element is only there to facilitate the rendition of 
"red letter" editions for those who want to do so. This attribute is 
there to remind the user of the text that additional punctuation marks 
are not to be inserted, here. See the example, above, and note that 
the quotation marks are already in the text AND a q element (in 
milestone format) surrounds that same quotation. The attribute is 
necessary because I chose not to remove the existing quotation marks, 
for the reasons I already gave you above.

>> The <q> marker as you have defined it has merit when generating 
>> quotation punctuation in the first place in a new translation. 
>> After 
>> that is done, it has no merit, at least for any application that I 
>> am 
>> concerned with: translation, typesetting, and electronic 
>> distribution.
>
>OK, but again you are telling me what you concluded but not why?

All I ask of OSIS is that it be a good standard Bible text interchange 
format. To do that, it must be able to represent the entire Bible 
text, including punctuation. It would be nice if it were more elegant 
and efficient, but inelegance and inefficiency are not that important. 
Lossless encoding is essential. If it can't do that, then it is not 
useful to me or the organizations I work with, at least not for the 
applications I would consider using it for.

The wording of the current documentation for OSIS pretty much demands 
that I treat quotation marks as a rendering issue instead of part of 
the text. I am unwilling to do that. I would rather introduce a 
competing standard than do that. Treating quotation marks as rendering 
issues may make sense when dealing with one or even a small number of 
languages, but the very idea of doing so is repulsive to me when 
dealing with any significant fraction of the world's languages.

>Not trying to be 
>pushy but I think we can work together towards a solution if we can 
>illustrate the problem as I have tried to show a solution above. 
>There 
>maybe reasons why a particular solution does not appeal to you or 
>work 
>in a given context but that again is something we can address.

If you really want to provide a solution that works, then alter the 
OSIS specification to allow it to be used in the manner that I'm using 
it.

>Note that truly random typographic markup, quotes or other markers 
>that 
>occur in a manner that cannot be described in terms of the structure 
>of 
>the text, cannot be encoded or rendered using a stylesheet aside from 
>use of PIs or specific stylesheet instructions that address those 
>elements.

So why would you want to force me to do it that way? That is very 
convoluted and doesn't even directly address the main issue.

>It is a fundamental limitation of XML that it cannot, without use of 
>one 
>of the mechanisms I mentioned (PIs/specific element styles by ID), 
>reproduce random typography. It may be very important and significant 
>typography but structured markup is ill-suited to that purpose. 
>Emphasis 
>on the fact we can do it, question is how important is it?

So don't do that. Let the punctuation be in the text.

>Get the same issue with academics and XSL-FO. Question there is that 
>a 
>text may look "better" with hand inserted micro-spaces between 
>letters 
>for an ancient text. Well, do you want to pay someone $20/hour (or 
>more, 
>I'm guessing) to typeset 200 pages of text or do you want me to spend 
>60 
>minutes setting up an XSL-FO stylesheet that allows you to render it 
>over and over, even after every correction? Is it as good as hand 
>typesetting? No, but then it is far cheaper and allows for revision 
>up 
>to the point we ship to the printer. Suppose you can guess which one 
>I 
>advocate. :-)

This is a totally unrelated issue, at least to my way of thinking.

>>    <revisionDesc resp="Rainbow Missions, Inc. 
>>    http://RainbowMissions.org">
>>     <date>2004-03-14T12.25.09</date>
>>     <p>
>> This draft version of the World English Bible is substantially 
>> complete in
>> the New Testament, Genesis, Exodus, Job, Psalms, Proverbs, 
>> Ecclesiastes,
>> Song of Solomon, and the “minor” prophets. Editing continues on 
>> the 
>> other
>> books of the Old Testament. Apocrypha books in this file are still 
>> in rough
>> draft form.
>> </p>
>>     <p>Converted ..\..\web.gbf in GBF to web.osis.xml in
>> an XML format that attempts to comply with OSIS 2.0 using 
>> gbf2osis.exe.
>> (Please see http://ebt.cx/translation/ for links to this 
>> software.)</p>
>>     <p>GBF and OSIS metadata fields do not exactly correspond to 
>>     each 
>> other, so
>> the conversion is not perfect in the metadata. However, the 
>> Scripture 
>> portion
>> should be correct.</p>
>>     <p>No attempt was to convert quotation marks to structural 
>>     markers 
>> using q or
>> speech elements, because this would require language and 
>> style-dependent
>> processing. In English texts, the hard part is figuring out what 
>> ’ means.
>> The other difficulty is that I am not yet convinced that the proper
>> punctuation marks would be reconstituted by software that reads 
>> OSIS 
>> files.</p>
>>     <p>The output of gbf2osis marks Jesus' words in a non-standard 
>>     way 
>> using the q
>> element AND quotation marks if they were marked with FR/Fr markers 
>> in 
>> the GBF
>> file. The OSIS 2.0 specification requires that quotation marks be 
>> stripped out,
>> and reinserted by software that reads the OSIS files when q 
>> elements are 
>> used.
>> To convert this to an OSIS 2.0 file, you must either remove all q 
>> elements,
>> remove the quotation marks around Jesus' quotes, or convince the 
>> keepers 
>> of the
>> standard to change the standard.</p>
>>     <p>OSIS does not currently support footnote start anchors. 
>> Therefore, these
>> start anchors have been represented with milestone elements, in 
>> case someone
>> might like to use them, for example, to start an href element in a 
>> conversion
>> to HTML.</p>
>>     <p>Traditional psalm book titles are rendered as text rather 
>>     than 
>> titles, because
>> the title element does not support containing transChange elements, 
>> as 
>> would be
>> required to encode the KJV text using OSIS title elements.</p>
>>     <p>The schema location headers were modified to use local 
>>     copies 
>> rather than the
>> standard locations so that these files could be validated and used 
>> without an
>> Internet connection active at all times (very important for the 
>> developer's
>> remote island location), but you may wish to change them back.</p>
>>    </revisionDesc>
>> 
>
>I recall some recent discussion of footnote start anchors but don't 
>have 
>it at my finger tips. Can you say a few words about that?

I thought that I already did that. I look at footnotes as a note that 
pertains to either a range of text or a point in the text. This note 
may be rendered at the bottom of the page, in a pop-up window, or 
whatever, but it contains information about the main text that may be 
helpful but that is not part of the main text. Since the note may 
pertain to a range of text, I have found it useful to mark the 
beginning of the text with a "begin reference" marker (<RB> in GBF), 
then mark the end of the text with an element containing the note 
itself. This way, it is easy to render the text to which the footnote 
pertains as a hyperlink. Also, if you wanted to treat footnotes like 
the JPS Tanach did in print (with superscripted markers at both 
places), you can. OSIS has no equivalent marker, so I put in a generic 
milestone. In rendering footnotes, if there is no beginning footnoted 
text marker, I just render it as a point, for example as a hyperlinked 
asterisk pointing to the note. In print rendering, I usually ignore 
the first marker, but could do something with it.

This isn't a major problem, just a feature that I miss that would be 
easy to supply. I can live with my current work-around in the current 
OSIS version just fine.

>Appreciate the disclaimer with information on what to change but I 
>still 
>don't see the language dependency as being a problem. That happened 
>when 
>the translation was made so I think we agree that the quotation style 
>from that perspective is fixed.

We obviously don't agree on this point. I guess the next question is 
"Can you humor me and allow the OSIS specification to be flexible 
enough to accommodate my needs as well as yours, even if you disagree 
with my philosophy of quotation mark rendering?"

I suppose I can always just modify the standard to my own taste (which 
I have sort of done) or generate a competing one (which I actually 
have a private draft of), but I would rather see us come to some kind 
of consensus. What is the point of a standard if it isn't really 
standard?

May God bless you with wisdom and insight.
Michael

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: http://eBible.org/mpj/gpg.htm

iD8DBQFAWObnRI/gxxfXR7sRAj7kAKCe4pTUp2g4/liQvySzspgEBQhQGACeLG4y
Owzv1F9ZeQPnuGHBguTrXdY=
=sESR
-----END PGP SIGNATURE-----