[osis-editors] Onyx status; USFM vs. OSIS vs. custom XML

Kahunapule Michael P. Johnson Michael_Paul_Johnson at sil.org
Thu Aug 12 05:20:51 MST 2004


Warning: Jargon ahead...
Onyx -- code name for a Scripture typesetting software project using Unicode USFM Scripture files and Microsoft Word 2003
SFM -- Standard Format Markers; a very flexible way to mark Scriptures, databases, etc., with backslash codes.
USFM -- Unified Standard Format Markup; a way to mark Scripture text with backslash codes in a way that is uniform among various Bible translation entities
Unicode -- a standard way to map every character in every living language to a unique, constant code point
XML -- Extensible Markup Language; a text-based way to express almost any kind of data
Schema -- a precise definition of a specific way to use XML for a specific application
WordML -- the XML schema used by Microsoft to define any Word Document in an open format; readable by Word 2003 and later only.
OSIS -- Open Scriptural Information Standard; a schema and documented way of using that schema to represent Scriptures and related data
TE -- Translation Editor; software under construction by JAARS for editing Bible translations
Paratext -- software currently available from UBS for editing Bible translations

Currently, my top priority in the Onyx project is working on the reverse transformation to get back to USFM from the Microsoft Word document (with embedded custom XML). This means that minor final edits in Word could be exported (indirectly) back to USFM. This is a big boost for user friendliness and ease of use. It also means that some people who like to use Microsoft Word as a Scripture editor will probably want to use Onyx & Word together for a Scripture editing solution, as well. I see no problem doing that, since you could always export your work to Paratext or TE to run checks, then import it back again.

The reverse transformation relies on embedding custom XML within the WordML document. For technical reasons, this custom XML schema cannot be OSIS, but it can be something that can be used to easily generate OSIS and/or USFM. In the near term, I'm only planning to support USFM, since OSIS is still under development and the current version still has some serious problems with punctuation handling that make it unsuitable for general use. (I have raised those issues as well as many less serious issues with the OSIS editors both recently and about 8 months ago.) USFM, however, is in current active use, and it is easy to convert other dialects of SFM, like PNG SFM to USFM.

On a more practical note, I'm looking forward to typesetting David Hynum's Numanggang New Testament plus Psalms, which may be the first Scripture to be officially typeset with this system.


Kahunapule M. P. Johnson <Michael_Paul_Johnson at sil.org>
http://eBible.org/mpj/




More information about the osis-editors mailing list