[sword-devel] OSIS 2.0.1 modules available
Michael Paul Johnson
sword-devel@crosswire.org
Thu, 05 Feb 2004 17:02:44 +1000
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
At 00:22 05-02-04, Chris Little wrote:
>Michael Paul Johnson wrote:
>> American Standard Version http://eBible.org/asv/asvosis.zip
>> God's Living Word http://eBible.org/glw/glwosis.zip
>> Hebrew Names Version http://eBible.org/hnv/hnvosis.zip
>> King James Version http://eBible.org/kjv/kjvosis.zip
>> Melanesian Pidgin http://eBible.org/pdg/TokPisinOSIS.zip
>> World English Bible http://eBible.org/web/webosis.zip
>
>Looks good. I saw just a few issues that need some correction. The
>most important is that <verse> eID's need a value matching the
>preceding
>sID on another <verse> element. I think this is the only issue that
>actually violates the spec.
Oops! Sorry about that. I have corrected the error in my source code
that did that, and will be uploading updates when I can. (I'm trying
not to be envious of broadband Internet connections available all over
the USA & other more developed nations.)
Of course, this does bring up a question. Should overlapping verses
ever be allowed? I would hope not, but the syntax would seem to allow
it. Perhaps something should be said in the documentation about that.
Actually, the content of sID and eID markers on verse elements are
entirely redundant (assuming you don't overlap verses), but someone
might actually look at them, so I would rather have them be useful. My
intention was to make them the same as the osisID of the first verse
of the verse bridge set (which is the only verse in the case of most
normal verses), as you suggested.
>Aside from that:
>The book <div> elements should have an osisID attribute where you
>used
>scope.
I'll add an osisID attribute to those and leave the scope. Redundancy
is obviously not a problem in OSIS. I rather think it is regarded as a
virtue. <grin>
>The code for English is "en". You can use "ENG" in the <language
>type="SIL"> element, however. (This isn't yet clear from the manual,
>of
>course, but I expect the final version of the manual will cover this
>area adequately.)
I did use "en" for English texts in <osisText osisIDWork="WEB"
osisRefWork="Bible" xml:lang="en">, but since I am most interested in
minority languages without two-letter codes, I'd prefer to stick with
the SIL Ethnologue codes wherever practical. For now, "ENG" is good in
the language element. The type is supplied, so it is not ambiguous. If
I nudge people towards supporting Ethnologue language codes, that
would be a good thing.
>Various other issues, like the format of the <identifier
>type="OSIS">,
>are in flux, and will probably be defined in OSIS 2.1 or the final
>manual. (My current best guess at the value
>"Bible.en.Rainbow_Ministries.WEB.2004-01-22".)
Actually, that should be "Rainbow_Missions" instead of
"Rainbow_Ministries" for the publisher name. That is easy to adjust,
as it is just a constant in the GBF -> OSIS converter code.
>> If you care to alter the <q> marker and quote marks to strictly
>> comply
>> with the OSIS 2 documentation, then you face the following
>> difficulties:
>>
>> 1. You MUST provide additional information outside of the OSIS
>> standard to the users of OSIS text that allows the punctuation to
>> be
>> EXACTLY recreated as in the original text. The rules of this
>> recreation and the exact markers used are different for different
>> languages, different dialects, and even for different translations
>> within the same dialect. They aren't even the same for all of the
>> texts above. If you use the <q> marks in the KJV to generate red
>> text,
>> that is OK, but if you generate quotation marks, you are changing
>> the
>> text. The KJV has no quotation marks, nor does the ASV.
>
>I was sympathetic with this position, since it really does make
>conversion from other formats easier, but using <q> is undeniably
>better.
I still deny that it is better. I remain unconvinced that use of <q>
to generate punctuation should be mandatory. Maybe I just don't like
computer geeks telling linguists & Bible translators what to do. Maybe
I have some valid reasons that you should consider.
I do concede that it is good to allow <q> to be used to generate
quotation marks where it makes sense -- and in some places it makes
lots of sense. I still disagree that it should be mandatory. I might
want to use this feature if I were drafting an entirely new
translation in OSIS (or something that converted more directly to
OSIS, which is more likely), and if I had software in hand to insert
the quotation marks the way they should go for this language and
style. I still think that once that insertion was done, I would prefer
to distribute the resulting text with quotation marks already
generated, and <q> tags, if present, serving only to indicate who the
speaker was. That way OSIS readers don't have to know all language &
style rules pertaining to punctuation for every language (not likely
to happen, really), and OSIS doesn't have to be extended to specify
all of these rules.
>It is true that different language, dialects, and translations use
>different standards of placing quotation marks. However, there are
>also
>plenty of instances when the SAME translation demonstrates different
>standards of placing quotation marks, depending on locale,
>paragraphing,
>and contemporary standards. This is part of why OSIS requires
>marking
>with <q> rather than typographic quotation marks.
This is NOT a benefit. Rather it is a serious defect in OSIS. The
reason it is a defect is that there is no way to unambiguously specify
how quotation marks must be generated for each language, and if
variants are allowed, then how. As a Bible publisher, I don't like
this. As a Bible translator, this bothers me. It takes control of the
punctuation away from the translators and publishers. It provides more
opportunities to make mistakes. Making reliable software is hard
enough.
As a computer geek, I think it is cool that I could change the way
quotation marks are rendered. I could render the NIV in verse list
format with quotation mark reminders starting at every verse like the
NASB, or render the NASB like the NIV. I could force English
punctuation rules on Spanish or Italian, or vice versa. When
extracting a scripture passage from the middle of a quotation,
punctuation could be adjusted to fit the quotation (i. e. putting
quotation marks around one of the Beatitudes when that is all you
quote). The first option is great if you ARE the publisher. If you
aren't, then I have a gut feeling that it is a good way to further
alienate IBS & Zondervan from us. (They consider the poetry & prose
formatting a translational issue not to be mucked with by the computer
geeks.) The second option is totally without practical merit, and is
really a disadvantage. The third option may be useful, but it would be
a problem if you extracted adjacent sections of Scripture, then
concatenated them.
The bottom line is that until I am convinced that proper punctuation
will ALWAYS be reconstituted by OSIS-compliant software, and that OSIS
itself provides enough information to do that for EVERY language,
dialect, and style variant, I will not support this feature of OSIS as
a mandatory item, nor will I recommend that anyone else does that. If
you want to make it optional, and if you allow me to tag who is making
a quotation without generating punctuation, then I would be happy with
that.
> (Another benefit is
>the potential for more richly tagged text, with speaker information.)
This can be a benefit, when it is done. It can also be a royal pain to
provide, and it isn't worth the effort of doing so for every
translation. I suppose that for translations that are close enough to
each other (i. e. based on the same source text and not too loosely
paraphrased), you could use a clever program to transfer the speaker
tags from one translation to another automatically. Better yet, maybe
you could just do that as a separate database, and merge the
information on demand in the display engine (i. e. in Sword). That
would be better, and wouldn't require everyone to tag their Bibles
that way.
>> 2. If you scan a new Bible text that has correct quotation marks,
>> you
>> probably won't be able to fully automate conversion from those
>> marks
>> to <q> markup.
>>
>> 3. If you fail in doing 1 or 2, above, you may be in violation of
>> copyright, trademark, and/or common law. Worse yet, you shift
>> responsibility before God from the translators to yourself for the
>> accurate transmission of His Word.
>
>Copyright, trademark, common law, aren't involved, though contract
>law
>might be (depending on your contract).
I beg to differ. Copyright and contract law are combined with the GLW
text, in that the text is copyrighted, but you have permission to do
pretty much anything reasonable with it for free PROVIDED THAT you
don't alter the text. Period. If you change the punctuation, you have
altered the text, and therefore have no permission to make copies
(beyond whatever "fair use" rights you might have, which are pretty
limited these days).
With the WEB & HNV, the text is in the Public Domain, but if you use
the trademarked names, then you are bound to not alter the text as a
condition of using the trademarked names. Otherwise, you have to call
it something different. Again, this is a combination of trademark &
contract law.
In reality, I'm not very likely to sue anyone for screwing up the
quotation marks in the GLW text, but I do have the legal right to do
so.
> Suggesting that you will somehow
>have "responsibility before God" (unless you're intentionally
>rendering
>incorrectly) would be pretty ridiculous and implies that every
>typesetter or translator who ever made a mistake while working on a
>Bible (probably all of them) will be held responsible for those acts.
It would be foolish to not be careful in dealing with God's Word,
don't you think? No, I don't think God will strike everyone dead who
makes an honest mistake, but I don't want to be one who intentionally
mis-handles God's Word or takes it lightly. On the other hand, the
original Greek and Hebrew manuscripts had no quotation marks. We only
put them in translations because the target languages require them.
They are derived entirely from the context. In a few cases (especially
in the Prophets), it is a judgement call as to where exactly the
quotation marks should go. Therefore, I'm not going to make a holy war
of this issue. Let your Holy-Spirit-sanctified conscience be your
guide.
>> The OSIS spec should be changed to allow separation of quotation
>> mark
>> generation markers from words of Jesus markers.
>
>We probably won't ever see that, precisely because there already
>exists
>a way to express this.
Sure-- the way I did it. Just change the documentation to say that is
OK. Alternatively, you could redefine <q> to always generate
punctuation and <speech> to never generate punctuation, but allow
either to specify who is speaking or writing. Both are milestoneable
markers used for approximately the same thing, right now.
>There also probably won't ever be anything akin to a note start
>anchor,
>since it can already be expressed. The first verse of the WEB reads:
>
><verse sID="Gen.1.1" osisID="Gen.1.1" />In the beginning <milestone
>type="x-noteStartAnchor" />God<note type="translation">After “God,”
>the
>Hebrew has the two letters “Aleph Tav” (the first and last letters of
>the Hebrew alphabet) as a grammatical marker.</note> created the
>heavens
>and the earth.<verse eID="Gen.1.1" />
>
>and could instead be encoded with a <catchWord> to indicate the
>annotant
>of the <note>: ...
>or with an osisRef with a grain, to explicitly define the range of
>the
>annotant: ...
Those approaches could work. They are quite contorted to my way of
thinking, but you could spend many man-months making it work in OSIS
generation, conversion, and display for HTML. Even for print, some
printed Bibles use footnote start & end markers. A start marker would
be MUCH easier to convert to HTML hyperlinks, don't you think? I'll
probably never support those methods you suggest. I even cut corners
in that I made no distinction between kinds of notes, because I don't
distinguish between them in the source text format (GBF). Maybe if I
ever used OSIS for a native Bible text format to start editing in, and
if good quality conversions to HTML and other formats already existed,
I might. Of course, only a hard-core computer geek would manually edit
OSIS Scripture texts (i. e. for a new translation) with nothing but a
text editor, so I'll wait to see if anyone generates a Scripture
editor that generates OSIS text that is easier to use than the current
alternatives.
Don't get me wrong. I almost like OSIS. <grin> I love the idea of a
good Scripture interchange format standard. OSIS seems to have more
support than XSEM, and it is XML, unlike USFM, GBF, or the old STEP
format. If I were starting from scratch, I would do some things
differently, but at this point, I'd rather ride on your octagonal
wheel than reinvent a round one. <grin> If I seem to whine a bit about
it, I'm just trying to get you to round off some of the corners so
that my passengers and I can have a smoother ride.
Take it for what it is worth...
... I'll let you know when I have the (almost) OSIS texts updated &
posted.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (MingW32)
Comment: http://eBible.org/mpj/gpg.htm
iD8DBQFAIep0RI/gxxfXR7sRAjhKAKDz8OSB3LtSn85dup7i7L3ye7g45ACfZvfO
QHkWqWHkSpSYZsb43+Gf3iA=
=KhyB
-----END PGP SIGNATURE-----