<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
On 03/05/2012 08:07 AM, Peter von Kaehne wrote:
<blockquote cite="mid:4F5500D8.7070504@gmx.net" type="cite">
<pre wrap="">On 05/03/12 17:33, Greg Hellings wrote:
</pre>
<blockquote type="cite">
<pre wrap="">On Mon, Mar 5, 2012 at 11:28 AM, Kahunapule Michael Johnson
<a class="moz-txt-link-rfc2396E" href="mailto:kahunapule@mpj.cx"><kahunapule@mpj.cx></a> wrote:
</pre>
<blockquote type="cite">
<pre wrap="">On 03/05/2012 03:20 AM, Greg Hellings wrote:
</pre>
<blockquote type="cite">
<pre wrap="">You seem quite taken with USFM, but remember that CrossWire and SWORD
do not support USFM as an import or display format. Therefore
information beyond just how to convert USFM into OSIS or ThML or GBF
which are supported is not really of importance.
</pre>
</blockquote>
<pre wrap="">
USFM is the format that literally hundreds of minority-language Bible translations exists in. Are you saying that the Sword Project is not interested in importing those?
</pre>
</blockquote>
</blockquote>
<pre wrap="">
I am not entirely clear what you are aiming at and I must say I do get
somewhat irritated with your tone. I do have a feeling over the last few
days that you are itching to get a fight. Why is that? Is this simply a
misunderstanding?</pre>
</blockquote>
<br>
It is most likely a misunderstanding. Perhaps I have also been
misunderstanding some of the messages that seem to be opposed to
USFM. I'm not trying to suggest that USFM be made an additional
internal format for Sword for Bible search and display, like GBF and
OSIS.<br>
<br>
Please let me be clear about what my goals, agenda, and purpose
really are.<br>
<br>
I have many USFM Bible texts in many languages. I will soon have
access to many more. I would like to convert them to various formats
for distribution and use, publishing them in ways that maximize
their usefulness and accessibility and study by many people in their
own languages. My primary focus is with minority languages, although
I have a few translations in languages that have many more speakers
that I will be converting. Sword is one of many possible outputs for
these Scriptures.<br>
<br>
Because of the large number of translations involved, and frequent
updates in the case of translations in progress, I'm not interested
in manual processes. I am only interested in automated processes
that are reasonably efficient and very reliable.<br>
<br>
As far as I'm concerned, it doesn't matter to me what formats you
store or display Bibles in. It can be the current Sword format set
defined by your API. It can be COBOL code and structured Latin if
you can make it work. What I do care about is that when I convert a
Bible (or portion) translation into one of your import formats, and
you import it and display it, that:<br>
<ol>
<li>You accurately preserve all of the original text and
punctuation (including quotation punctuation) exactly as it was
in the original USFM. This involves the complete process from
module creation to display in all front ends. This is an
absolute requirement with respect to the canonical text. If this
condition isn't met, then I don't have permission to convert
these Scriptures to Sword format, nor do you have such
permission.<br>
</li>
<li>I would prefer to have formatting such as prose and poetry
preserved, and to have noncanonical text such as introductions
and subtitles passed through for display in a way that
differentiates it from the canonical text, although it would
probably be acceptable to strip this information out or make it
conditionally display.</li>
<li>I would prefer to have footnotes displayed in a format that
makes sense for the platform.<br>
</li>
</ol>
In other words, I care about the end-to-end system, primarily.<br>
<br>
I don't care if the Sword Project ever supports USFM in any way
except to import it, directly or indirectly through OSIS or another
format, into Sword. I never suggested using USFM or its XML kin in
any other way within the Sword project. I don't care how you display
USFM on your web sites, wiki or otherwise, or what formats you use
internally to the Sword project, as long as it works end to end
without losing a single jot or tittle. However, I do think it is
important that you document the best ways to convert USFM to a
format you can import. I think you do, too, really.<br>
<br>
I am aware that you have some tools to import a small subset of USFM
to a form of OSIS that works with osis2mod, and have created some
modules with it. I'm also aware of the OSIS manual section that
contains a list of OSIS near equivalents for most (but not all) of
the current USFM tags that actually appear in the Bibles I'm working
with. My tests using those tools so far have found them wanting. I'm
going to try to fix that by doing my own conversion from USFM to
OSIS. Please forgive what may have appeared to be criticism without
a constructive purpose. I'm trying to convert Scripture files on a
scale and with speed that is apparently unprecedented.<br>
<br>
I intend to write a USFX-to-OSIS converter that produces output that
should validate against the current OSIS schema, and which will
import correctly into Sword modules. (GBF might be an option, too,
but I think that if the difficulties with OSIS can be overcome, it
would be better to use OSIS.) At least that is what I'm going to
try. If I succeed, you need not deal with USFM and its XML kin
directly ever at any time. You can just send people to a different
open source project for that piece of important functionality.<br>
<br>
There are some things that I will do that may not fit the way some
members of the committee that designed OSIS envisioned things. For
example, in the OSIS files that I generate, all of the quotation
punctuation will be left as part of the Bible text, and never
included in a <q> marker, either implicitly or explicitly with
a "marker" attribute. If I need to mark direct quotes of Jesus
Christ in a particular translation, I will do so by converting USFM
\wj ...\wj* markers directly into <q who="Jesus" marker=""
sID=""/>...<q marker="" eID=""/>, where the marker
attribute is always empty. This should, according to
<a class="moz-txt-link-freetext" href="http://crosswire.org/wiki/OSIS_Bibles#Marking_Quotations">http://crosswire.org/wiki/OSIS_Bibles#Marking_Quotations</a>, result in
lossless display of the proper quotation punctuation in all front
ends that comply with that same interpretation. I don't plan to use
<q> for anything other than direct quotes of Jesus. This usage
is philosophically compatible with USFM and OXES. It is also
actually easier to render, since the Paratext interpretation of USFM
does not allow \wj ...\wj* markers to cross verse boundaries.
Therefore, you don't have to process beyond the beginning of the
current verse to determine if you should turn on an optional red
attribute or not, even in an extended quotation like <i>The Sermon
on the Mount.</i><br>
<br>
Another thing I will do is convert legacy (deprecated) "display"
markup for bold and italics directly from USFM to <hi
type="bold"> and <hi type="italic"> markers. The reason for
that is that I have translations where I have tried to replace
"display" markup with the appropriate "semantic" markup, only to
find that USFM does not have a suitable replacement for the way
certain translators have chosen to use these text attributes.
Fidelity to the translation and deference to the translation
committees wins out over abstract arguments about separation of
semantics from presentation forms. In essence, these attributes that
are considered in some languages to be mostly a presentation issue
are actually a semantic issue in other languages. This is not a
winable argument, so I just perpetuate the use of this kind of
markup and hope that front ends will honor that markup. The
consequences of not doing so are presentation of writing that is
less clear and ugly in the subject languages. There may also be
cases where I preserve the bold and italic markup just because it is
too time-consuming to try to figure out what it should have been in
each case, based on where it is, but in a language I can't read.<br>
<br>
I hope this helps...<br>
<br>
Shalom,<br>
Michael<br>
<br>
<br>
</body>
</html>