Posted by Bond at May 22, 2011 6:57:13 AM
Convert from RTF or HTML to OSIS (or any other importable format)
Is there a way (a utility?!) to convert from RTF or HTML to any format importable by Sword project? RTF or HTML to OSIS.... RTF or HTML to GBF(IMP). Whatever is easiest.

GBF(IMP) seems to be the easiest markup, as it more closely resembles HTML and RTF. And I successfully imported some material this way. But the lack of documentation makes it difficult without trial and error guesswork at the exact syntax. For example, I never did get indentions to work (although I did get verse references to work).

I think I know what the answer is. I think the answer is going to be me doing regular expression search and replaces to get the material into an importable format. Is there a sample OSIS commentary/dictionary that has examples of bold, italics, new paragraphs/lines, indentions (like tab), verse references (not necessarily footnotes like the sample commentary posted but actual verse or cross references)? Or, I'm also open to GFB(IMP) if I could find clearcut examples of the above in a sample commentary or dictionary.


Posted by mdbergmann at May 22, 2011 11:34:28 AM
Re: Convert from RTF or HTML to OSIS (or any other importable format)

OSIS is the preferred format for Commentaries and Bibles. TEI is our preferred format for Dictionaries. Both are XML.
If you have HTML which is not very far from XML it should be relatively easy to convert this to OSIS or TEI.

There would also be a possibility to use ThML which is HTML with some additional markup tags.

Have you seen the development section in our wiki.
This page about ThML might also be helpful to you.

There is no utility that I know of. But if you are familiar with XSLT it can do XML based transformation quite easily.
My tool of choice would be the programming language Scala as it can very easily can do transformations of XML.


Posted by Bond at May 22, 2011 1:11:33 PM
Re: Convert from RTF or HTML to OSIS (or any other importable format)
I took another look at OSIS. It can be overly complicated but can also be pretty straightforward as well. For simple commentaries and other resources, it may not be as bad as I thought.

It's a shame at the lack of a conversion utility. An automated, 100% cleancut conversion may not always be possible, but I think something could be made to get you 95% of the way 95% of the time. A lot of public domain content is in html and rtf format already. I don't mind having to make my own conversion resource, but a lot of people wouldn't bother with it. Something to think about. :)

The documentation refers to this link: to show various samples. But the link (in that link) is dead (domain expired some time ago and was registered by someone else). I wonder where I can get samples like that?