[sword-devel] Tool for convertion html to osis
dfhdfh at protonmail.com
Sat Feb 2 03:44:19 MST 2019
As a further thought - thinking of Greg’s examples of HTML superscripts.
It’s usually not essential to consider that verse tags are superscripted or given a fancy colour, etc.
That they are verse numbers is usually evident by their position. This being the case, squashing the character level style for them simply makes it simpler for them to be tagged using “\v “ as part of the conversion to USFM.
Sent from ProtonMail Mobile
On Sat, Feb 2, 2019 at 08:30, refdoc at gmx.net <refdoc at gmx.net> wrote:
> Greg has nailed it.
> Practically I try and work out first is the file follows any kind of pattern or is just a pile of junk. Too often latter is the case and life has become too short to bother.
> If there is a pattern then the pattern maybe expressed in CSS, in html tags in combinations. And some are maybe only in the actual text.
> My approach has always been to recognise as many as I can find and then nuke the rest. And then use any technology I know of, regex, xsl whatever to.transform each bit into something useful in OSIS.
> Usually this is an iterative process with some patterns only emerging as I go along. And others not as clear as thought originally.
> Sent from my mobile. Please forgive shortness, typos and weird autocorrects.
> -------- Original Message --------
> Subject: Re: [sword-devel] Tool for convertion html to osis
> From: Greg Hellings
> To: SWORD Developers' Collaboration Forum
>> On its surface, this is a very straightforward process.
>> 1. Convert the HTML (which is a specific set of defined tags using the SGML grammar) into XML (not specifically targeting XHTML, as that's a slightly different grammar, but all HTML in places where it violates XML rules can be rendered into XML-compatible forms as long as it is well-formed, since XML is just a strict subset of SGML that requires certain things that SGML leaves as optional).
>> There might be other tools to do this specifically, but you can get by with the command line tool `osx` from the Open Jade framework. If you use Fedora this is available from the "opensp" package. I presume other Linux distributions have it similarly packaged.
>> 2. Convert the XML version of the HTML into OSIS using an XSLT.
>> So the simplicity of #2 really boils down to the nature of the HTML you're dealing with, and if it is exceedingly complex in its own right, how much of its own information you need to preserve in the OSIS that you're getting out the other end. And without any visibility into the file, none of the rest of us can begin to guess at the complexity of that process. But it CAN be automated. Like John, I've invested a lot of time back in the day on converting Logos XML to OSIS, and I'm happy to say these things are possible (just not always easy).
>> There are a number of people on this list who are and could be qualified to assist you if there was a lot more information to fill in all the details of what I've just described above. However, whether you can engage us will depend on the nature of the text you have, the way you've been given it, and any distribution requirements and rights that it's held under.
>>  http://openjade.sourceforge.net/
>> On Fri, Feb 1, 2019 at 10:27 AM Cyrille <lafricain79 at gmail.com> wrote:
>>> All is in the title, someone have a Linux tool to convert html files to
>>> In this case it is for the KD module. I download the html source files
>>> but I want not to work a lot on it. First I will work on bible issues
>>> and not commentary. But if someone have a tool to do quickly the job...
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the sword-devel