[sword-devel] Re: WEB has missing verses

Michael Paul Johnson sword-devel@crosswire.org
Thu, 22 Jan 2004 17:13:16 +1000


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At 16:20 21-01-04, Chris Little wrote:
>On Wed, 21 Jan 2004, Michael Paul Johnson wrote:
>
>> The
>> other is the lack of verse bridge annotation. For lack of a 
>> standard 
>> way to do it, I encoded osisID attributes as if they were osisRef 
>> attributes, allowing ranges, like
>> <verse sID="Gen.1.6-Gen.1.7" osisID="Gen.1.6-Gen.1.7" />Bihain God 
>> i 
>> tok olsem, “Wanpela banis i mas kamap bilong banisim wara, bai 
>> wara 
>> i stap long tupela hap.” Orait dispela banis i
>> kamap. God i mekim dispela banis i kamap bilong banisim wara antap 
>> na 
>> wara daunbilo.
>> <verse eID="Gen.1.6-Gen.1.7" />
>
>osisID attributes may contain a list.  The way to denote a bridge 
>would be 
>to list all included verses.  Also, sID & eID need not match the 
>osisID, 
>they must simply match each other, so a valid way of expressing your 
>start 
>tag would be: <verse sID="Gen.1.6" osisID="Gen.1.6 Gen.1.7"/>

OK. I did made my conversion program output that format. It gets 
pretty verbose on bridges of more than two verses, but it works. 
Verbosity obviously was considered a virtue, anyway, in the OSIS 
design. <grin> I retract my complaint about a lack of verse bridge 
encoding method.

I now have converted ASV, KJV (with Apocrypha), WEB, HNV, GLW, and Tok 
Pisin Buk Baibel (PDG) texts from GBF into OSIS 2.0.1 format that 
validates against the schema correctly. Some of the meta-data in the 
header could be expounded on, but the Bible text looks reasonable. 
Unlike the WEB example on the Bible Technologies web site, poetry and 
prose formatting are properly preserved. The only significant 
variation from the OSIS 2.0 documentation is that I intentionally 
violated the condition that <q> markers and quotation marks should not 
both be used in the case of marking direct quotations of Jesus Christ 
(i. e. for red-letter editions). I did this to preserve the <FR>/<Fr> 
GBF markup in case someone might want the option of rendering this 
text differently, rather than to require it to be so. The markup I 
used is something like:

 <q sID="Matt.3.15.1" who="Jesus" type="x-doNotGeneratePunctuation" 
 />“Allow it now, for this is the fitting way for us to fulfill all 
 righteousness.”<q eID="Matt.3.15.1" />

(If you see strange three-character sequences with Euro signs above, 
imagine proper typographic quotes in their place.)

I like the ability to mark quotations with milestone type elements 
from which punctuation is generated, but I detest the thought of 
requiring it to be done that way. There are just too many variations 
on language and style rules, and the reverse conversion cannot be 
reliably done without manual intervention. Once the punctuation is 
properly in place, there is no benefit to such markup sufficient to 
motivate me to do that. I'd rather spend my energy lobbying for a 
change to the spec to allow Jesus' words to be marked without 
requiring punctuation removal. Punctuation is for the linguists to 
decide, not the XML spec writers.

Uploads from PNG are slow. So far, only the HNV OSIS text is in its 
home at http://eBible.org/hnv/hnvosis.zip. I'll put the other Bible 
files (except for the Tok Pisin text) and the source code for the 
converters in a public place when I get a chance, probably next week.

May God bless you lots!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (MingW32)
Comment: http://eBible.org/mpj/gpg.htm

iD4DBQFAD3f6RI/gxxfXR7sRAr+8AJQPuiNaxkT9uyO3+IP611AmT9A2AJ0b97HV
j5RjBJeff+5VRJYMsqSIjQ==
=GrqT
-----END PGP SIGNATURE-----