[sword-devel] Sword OSIS quotation mark handling question

Mon Apr 30 15:02:54 MST 2007

Kahunapule Michael Johnson wrote:
> DM Smith wrote:
>> Kahunapule Michael Johnson wrote:
>>   
>>> How does the Sword project handle display of OSIS text quotations when:
>>> 1. the <q> or <speech> element is used without a marker attribute,
>>>   
>>>     
>> The speech element is not handled, except to process its content. It is 
>> as if the element were not in the text at all. I think the speech 
>> element is to indicate the speaker, not that what's said is a quote. I 
>> won't mention the element <speech> below.
>>   
> OK. I have no need to generate the <speech> element, as there is no 
> USFM equivalent, so I'll ignore it, too. :-)
>> Assuming that the module's conf does not have osisQToTick=false (i.e. it 
>> defaults to true when not present), then the level attribute determines 
>> the quotation mark that will be used, alternating double quote and then 
>> single quote. If no level attribute is present, then it uses a double quote.
>>
>> It will use the same mark when it gets to </q>.
>>   
> In that case, would open quote reminders be inserted at paragraph and 
> stanza beginnings automatically, or would that require a cQuote 
> milestone to make happen? (I'm just curious. Normally, I'm interested 
> in just making sure this doesn't happen, since the quotation 
> punctuation is already fully specified, and it may not conform to 
> current English usage. However, in the hypothetical case where someone 
> wanted this to happen, I'm curious how it would be done.)

It requires a cQuote milestone to make it happen.

>> The same holds true when milestoned versions of <q> are used, except 
>> that <q eID="xxx"/> elements will not cause the code to look at the 
>> opening <q sID="xxx"/> for a marker attribute. Instead, it will use the 
>> marker attribute, or it's lack to determine what to output.
>>   
> So in the milestone elements, markers may vary. That is actually good, 
> since sometimes quotes are introduced with an em dash and close with a 
> newline, or some other asymmetrical case.

They can be anything you want them to be, even being more than a single 
character.

>> However, if osisQToTick=false, no quotation mark is used.
>>   
> So osisQToTick=false is essentially equivalent to putting a marker="" 
> attribute on all <q> elements?

Yes.

>>> 2. the <q> or <speech> element is used with a marker attribute,
>>>   
>>>     
>>
>> When the marker attribute is present, it is used.
>>   
> Good. :-)
>>> 3. no <q> or <speech> elements appear, or
>>>   
>>>     
>> Then as far as sword is concerned then it is not in a quote.
>>   
> OK... what, exactly, does that mean? Does that make a difference for 
> anything besides the option of rendering Words of Jesus in red (or 
> some other alternate color) for display? Normally, the point of 
> knowing if something is in a quote or not is to display the quotation 
> marks correctly, but if there are no quotation marks to display (or 
> they are already in the text in whatever way is appropriate for that 
> language), then  Sword doesn't actually need to "know" when something 
> is a quote or not, does it? Or is there some search feature or 
> function that I'm not aware of that would use such knowledge?

Maybe I didn't understand 3. If a verse or passage is rendered and a 
<q>, <q/> or </q> is not found, then there is no way of knowing that 
verse or passage is in a quote.

The Sword engine does not know and the front-ends don't try to figure it 
out.

>>> 4. quotation punctuation (“, ‘, ’, ”, «, », —, newline, etc.) appears
>>> outside of <q> or <speech> elements (i. e., not in a marker attribute)?
>>>   
>>>     
>> Any punctuation in the text is produced as is.
>>   
> This is good. Very good. :-)

Some frontends might not be able to handle it.

>> Another feature of OSIS is <milestone type="cQuote" marker="xxxx"/>
>> This is used for a continuation quote. (substitute xxxx with the 
>> appropriate quote mark)
>>   
> This is good to know. I regard this (or something like it) as an 
> essential feature if all quotation marks are going to be put in markup.
>> Words of Christ (WoC) can be indicated by adding who="Jesus" to the <q> 
>> container element or to both the milestone elements. In the KJV, ESV and upcoming NASB modules, the WoC are marked on a per 
>> verse basis, using the container form of <q>, with marker="".
>>   
> This is an interesting concept-- and one that is helpful to me. You 
> see, I thought that marking WoC per verse was bad OSIS the way I read 
> the documentation, but it sure makes conversion from USFM (which 
> actually demands that sort of markup) easier (because I don't have to 
> discard adjacent end + start pairs with no actual text in between, 
> just a verse marker), and it also makes display easier on a 
> verse-by-verse basis (like Sword does) easier if you are working from 
> raw OSIS. The same technique would be useful for translating the USFM 
> \qt ...\qt* markup (which is marked verse-by-verse to indicate OT 
> quotes in the NT) to <q marker="" who="OT" sID="somethingunique">...<q 
> marker="" who="OT" eID="somethingunique">. If you regard this as 
> acceptable, then I'll just embrace it quickly before anyone objects. :-)

We have had our discussions about this. There are front-end problems 
with marking it up at the start and end of the WoC:
For systems that put each verse in a html table cell (as swordweb does 
in parallel view) verses that have a WoC end quote, but not a begin 
quote, then these will not display properly.
For Matt 5-7, displaying chapter 6 in any frontend will not display in 
"red".

>
> OSIS is very flexible, and there seem to be many reasonable ways to 
> interpret how Scriptures should be encoded. At this point, there are 
> so many ideas out there, I would like to just start with one goal: 
> encoding OSIS texts from USFM in such a way that Sword displays them 
> properly. If that works, then there is a good chance the resulting 
> OSIS will be of use to others, as well.
>
> Would it be too weird to separate q elements intended for replacing 
> punctuation (with marker specified) from those used for what is 
> essentially a character style (i. e. WoC)? Like <q marker="“" 
> sID="aoeu"/><q marker="" sID="qjkx" who="Jesus"/> (actual quotation) 
> <q marker="" eID=qjkx" who=Jesus/><q marker="”" eID="aoeu"/>, where 
> the actual quotation may span several verses, and the inside set of 
> markers may be ended and restarted with each verse?

I'm not sure I understand. The important thing is to test it in a Sword 
application to see if it does what you want it to. If it does not, it 
might require a change to the Sword engine or it might be simpler to 
change your transformer.

You can take a look at the XML for the KJV here: 
http://www.crosswire.org/~dmsmith/kjv2006/sword/kjvxml.zip
It is a good example of how to do WoC all Sword frontends can handle it.

>>> I want to (1) ensure that Bible texts are displayed correctly, and (2)
>>> minimize the amount of manual labor necessary to make #1 happen.
>>>
>>> It should not be necessary to do any manual editing of Bible source
>>> texts in well-formed Unicode USFM to create a valid Sword module. (USFM
>>> or something close to it is the format in which a very large number of
>>> minority-language Bibles exist.) In USFM, quotation punctuation, if any,
>>> is in the text of the document, with no special markup. In an informal
>>> extension to USFM, sometimes << is used for “, < for ‘, etc. (A space is
>>> required to disambiguate “‘ and ‘“.) Speaking of ambiguity, apostrophe,
>>> closing single quote, and (in some languages) glottal stop all use the
>>> same character. This ambiguity, coupled with language and style
>>> considerations, seems to be a serious problem in automatically
>>> converting from either GBF or USFM to OSIS, in general.
>>>   
>>>     
>> I have recently written a quote recognizer in C++. I did find that an 
>> apostrophe is potentially ambiguous, but in the source I was working, it 
>> was not an issue.
>>
>> Fortunately, my input use ` for a single quote start and ' for an end 
>> quote. This made disambiguation significantly easier.
>>
>> If you wish, I can send you the routine.
>>   
> I already have some LGPL C# code that does a reasonably accurate job 
> of recognizing quotation marks in English text that I use for checking 
> quotation-mark balancing. It doesn't work very well for other 
> languages, because it uses some English-specific rules to disambiguate 
> apostrophes and closing single quotes, and doesn't even handle the 
> case where the same marker is used for glottal stop. (The latter is 
> bad practice in Unicode, but some people do it anyway.) Does your 
> quote recognizer work for non-English Bibles with different writing 
> systems and different punctuation rules?

I don't know. It does work for Spanish. I'll send you my code and you 
can decide.

>>> I'm wondering if I should target OSIS or GBF as a target format for a
>>> converter I'm writing, and also working on updating the dialect of OSIS
>>> that the World English Bible and HNV are distributed in. While I'm not
>>> in favor of dropping support for GBF, yet, I'm not very thrilled about
>>> the idea of putting any new work into supporting it, either. However, if
>>> I can't make an OSIS module without a lot of manual labor, any
>>> reasonable alternative is worth considering.
>>>   
>>>     
>> Remembering your earlier posts about OSIS's lack of quotation support, I 
>> think I can now say that it provides you the level of control that you 
>> wish. Having done three modules myself, I think that OSIS 2.1.1 is 
>> sufficient for Bible texts.
>>
>> So, I'd suggest OSIS.
>>   
> Indeed, it looks like I have at least two ways to get the level of 
> quotation support I want: (1) always put quotation punctuation in 
> marker attributes of q elements or cQuote milestone elements and 
> specify empty marker elements when using q just for WoC, or (2) [pause 
> to don body armor and start running] always put quotation punctuation 
> in the text and use q elements with empty marker attributes just for 
> translating USFM \wj ...\wj* and \qt ...\qt* markup on a per-verse 
> basis. Option #1 has the major disadvantage of requiring finding all 
> of the quotation punctuation in text I may not be able to read, let 
> alone understand the grammar of, for conversion purposes. Option #2 
> has the disadvantage of potentially offending certain people who have, 
> at least so far, held the deep religious conviction that all quotation 
> punctuation should live in markup, not the text of the Bible, but it 
> has the major advantage of the simplest, fastest conversion possible 
> from USFM to OSIS, with no manual labor required for each translation 
> (other than making sure the source text is really in Unicode USFM). 
> Although option #2 seems like it would work just fine, at least 
> functionally if not idealistically, I'm concerned that someone might 
> think such texts weren't pure enough OSIS, and not use them. If that 
> is the case, then perhaps I really would be better off going back to 
> GBF... or just punting on this whole converter and move on to 
> improving my converters to other formats for other Bible study software.

My personal opinion: It is more important to have excellent modules than 
to quibble over this.

>
> In the case where the translators have made use of the <<, <, >, >> 
> quotation markup option in their SFM, which is actually a fair number 
> of them, I would like to convert those to the appropriate q elements 
> with markup specifying the normal equivalent of those markings. I'm 
> loathe to mess with apostrophe/ending single quote disambiguation for 
> non-English texts, though. I don't see any benefit to doing so, 
> really, but maybe I'm missing something?

I think that it requires a lot of analysis for each language to 
determine whether the apostrophe disambiguation worked or not. It may 
not be worth the effort.

>
> What do you think?
>
> Michael
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page