<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">I'm breaking my long period of ignoring
and avoiding OSIS, and working on building a USFX to MOSIS
converter into the open source Haiola software, both into the UI
tool and as a stand-alone cross-platform executable. The "M" in
"MOSIS" is for "Modified". The only significant modification is a
shift in the semantics of <q who="Jesus" sID="somethingunique"
marker=""> to be used only in milestone form, only for
quotations by Jesus (as an equivalent of the <wj> tag, and,
rather than only at the beginning and end of the quotation, to
stop and start at verse boundaries. The proper quotation
punctuation for the translations are always in the text of the
translation, where almost all translators believe they belong. The
result is not exactly in line with the original intentions of
OSIS, but should validate against the Schema fine, and actually be
easier to display. This is a fairly harmless exception for Sword
use, since the result is processed to display on a verse-by-verse
basis, anyway.<br>
<br>
It takes more than simple replacement of tags, i.e. with awk, to
get the conversion right, if you really understand both the source
and destination standards. I'm working in C#, because that is the
tool I know best, although other languages could work, too. It is
the actual logic implemented that matters.<br>
<br>
Although there is a fair amount of varying interpretation of what
USFM markers should mean and some historical artifacts left over
from when other SFM predecessors were in use with different
meaning, not to mention intentional variation from the current
USFM standard, I have 241 USFX Scripture texts in 237 dialects of
212 languages that are all "clean" enough with respect to markup
that I can and did produce web sites from them. They should be
clean enough to convert to sword modules in an automated fashion.
11 of those are Public Domain. The rest are available under the
terms at <a class="moz-txt-link-freetext" href="http://PNGScriptures.org/terms.htm">http://PNGScriptures.org/terms.htm</a>. The exceptions to
markup cleanness that remain are generally problems with
peripheral materials other than the actual Scriptures, which could
be stripped out until such time as someone manually cleans them
up. <br>
<br>
Some of the metadata expected by OSIS isn't present in raw USFM
source, but I have that stored in other XML files in Haiola
project configurations, so I'll pull that in for the merge.<br>
<br>
I have more texts that can be added to the set of 241 mentioned
above, but I haven't cleaned them up and processed them, yet. So
much work, so little time... time to pray and code!<br>
<br>
<br>
On 11/08/2012 12:39 AM, Chris Burrell wrote:<br>
</div>
<blockquote
cite="mid:CACQnaRVT1a5pkDicW4P0D9XU9xEZ8H3xyWP7Pgw1Zw3193rnLw@mail.gmail.com"
type="cite">Thanks for all the info. On the last point, I did mean
read directly from USFM. I don't know the format well-enough, but
presumably if other software uses it, then maybe we could have a
go at displaying the best we can...
<div>
Chris</div>
<div><br>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On 8 November 2012 10:17, Peter von
Kaehne <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:refdoc@gmx.net" target="_blank">refdoc@gmx.net</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Chris,<br>
<br>
> Von: Chris Burrell <<a moz-do-not-send="true"
href="mailto:chris@burrell.me.uk">chris@burrell.me.uk</a>><br>
<div class="im"><br>
> I've found some instructions on transforming usfm/x
to osis on the wiki<br>
> but<br>
> was wondering how difficult it would be to automate a
lot of it?<br>
<br>
</div>
Several of us have been starting to think and experiment
with this too.<br>
<br>
Basically it is easy to automate as such. The problem is
around cleaning up.<br>
<br>
There is a thread earlier this year where some of this was
discussed. The basic plan is to use a git repository and git
hooks with scripts attached to that. Some infrastructure is
up, but not much else has happened yet.<br>
<div class="im">><br>
> is it such that there is too much manual cleaning up?<br>
<br>
</div>
Manual/mechanical cleaning up is a huge need, unfortunately.
I have not yet encountered a truly clean USFM text, despite
all claims by various USFM experts.<br>
<div class="im">><br>
> also, I was wondering if there's any appetite in
developing a driver to<br>
> read such modules, within sword or jsword...<br>
<br>
</div>
You mean to read directly USFM?<br>
<span class="HOEnZb"><font color="#888888"><br>
Peter<br>
</font></span></blockquote>
</div>
<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a>
<a class="moz-txt-link-freetext" href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page</pre>
</blockquote>
<br>
<br>
<div class="moz-signature">-- <br>
<meta http-equiv="CONTENT-TYPE" content="text/html; charset=UTF-8">
<title>signature</title>
<p><font color="#000000">Aloha,<br>
<i>Michael Johnson</i></font><br>
<font color="#000070"><a href="http://mljohnson.org">mljohnson.org</a><br>
PO BOX 5278<br>
KAILUA KONA HI 96745-5278<br>
USA<br>
<br>
Phone: +1 808-333-6921<br>
Skype: kahunapule</font></p>
</div>
</body>
</html>