[sword-devel] Why is OSIS preferred? Was Re: usfm2osis.pl
Chris Little
chrislit at crosswire.org
Tue Jul 1 06:54:47 MST 2008
Karl Kleinpaste wrote:
> "Jonathan Morgan" <jonmmorgan at gmail.com> writes:
>> ThML is also still (I think) used by the greatest percentage of our
>> modules (though that may be changed in the future).
> ...
>> Will GBF continue to be supported? I seem to remember that Chris
>> reported lack of GBF support as a missing feature in BPBible, despite
>> the fact that I'm sure that I have heard statements suggesting GBF is
>> very strongly deprecated. How many modules are still GBF?
>
> A couple shell commands will give useful summaries. Refresh main and
> beta repos in your mod.mgr, then peek in ~/.sword/InstallMgr/*/mods.d.
>
> for i in plain gbf thml osis ; do
> echo $i `grep -i ^sourcetype=$i * | wc -l`
> done
>
> Main: Beta:
> plain 2 plain 1
> gbf 49 gbf 0
> thml 163 thml 6
> osis 23 osis 93
This is a little misleading because plain is usually unmarked. (It's the
default value of SourceType.)
The history of the numbers is basically that when I came to CrossWire,
there was support for plaintext, GBF, and a specialized filter for just
the RWP module. Eventually I outgrew GBF's capabilities, so I submitted
the ThML filters and started using ThML wherever it appeared that GBF
would be incapable of handling the data. Then I got this grand idea that
we should use a single format for everything so that we wouldn't have to
keep supporting n input formats times m render formats every time we
needed to add features and so that we could have a more consistent look
& feel across modules. At the time, ThML was the best we had, so lots of
things got encoded as ThML, regardless of whether they could have been
encoded as GBF. Then we got involved in OSIS, so we wrote OSIS filters
and have been, fairly consistently, releasing only OSIS (or plaintext)
Bibles.
As content gets upgraded, it will generally be upgraded to OSIS or TEI.
Likewise, new content will generally be OSIS or TEI. And everything that
gets posted in these formats will have passed schema validation.
> The reason for the new increase in beta OSIS modules is due to the
> arrival of 41 new WBT texts 2 days ago -- almost half the beta repo in
> one shot.
>
> Significantly, a couple of really important modules (LXX, for one) are
> still distributed as GBF.
>
> (Aside: All these new WBT texts appear in GS as "unknown" language. Is
> there a mapping somewhere handy, from "ngu", "tzz", et al to something
> readable by mere mortals? I'm happy to update GS to accommodate more
> language definitions but I need a source for them.)
The current ISO 630-3 table is at
http://www.sil.org/iso639-3/iso-639-3_20080529.tab
You can usually get an English-language name of the language by
extracting the LCSH value, too (after removing Bible. and possibly O.T.
or N.T.). I haven't added this info to the WBTI Bibles yet, though.
However, some of the language codes are incorrect and need to be fixed.
(The ones I know of ATM are sco, which should be cso, and xmt, which
should be mxt.)
--Chris
More information about the sword-devel
mailing list