<div>As HTML files can be opened using Microsoft Word, my initial step is to save the file as RTF type. </div><div><br></div><div>I then use WordPad to open and resave the RTF file. This reduces size and clutter. </div><div><br></div><div>At this stage, one needs to determine if any of the text styles are semantically significant. e.g. Are italics used for added words? And has anything of importance already been squashed?</div><div><br></div><div>The key understanding is that RTF files can be processed by scripts or filters. You can soon learn what are the useful tags. </div><div><br></div><div>Assuming something’s been done to mark such words with some non-RTF tags such that the next step no longer loses the markup, that step is to open with WordPad and save as Unicode text (which gives UCS-2 aka UTF-16 LE).</div><div><br></div><div>Open the text file with (e.g.) Notepad++ and change the encoding to UTF-8, and resave. </div><div><br></div><div>Now the rest of the scripting can be done on the plain text. </div><div><br></div><div>I’ve found success with this mixed general purpose approach for several projects. </div><div><br></div><div>[The first step can be done using LibreOffice, if that’s what you prefer. ]</div><div><br></div><div>Best regards,</div><div><br></div><div>David</div><div><br></div><div id="protonmail_mobile_signature_block">Sent from ProtonMail Mobile</div> <div><br></div><div><br></div>On Fri, Feb 1, 2019 at 22:07, Dudeck, John <<a href="mailto:John.Dudeck@sim.org" class="">John.Dudeck@sim.org</a>> wrote:<blockquote class="protonmail_quote" type="cite"> <title></title> <div align="left"><font face="Arial" size="2"><span style=" font-size:10pt">I might just say from my recent experience, creating OSIS from other sources is not a trivial matter. </span></font></div> <div align="left"><font face="Arial" size="2"><span style=" font-size:10pt"><br> </span></font></div> <div align="left"><font face="Arial" size="2"><span style=" font-size:10pt">Depending on whether you are creating a Bible, a Commentary, or a GenBook, the process is not the same. </span></font></div> <div align="left"><font face="Arial" size="2"><span style=" font-size:10pt"><br> </span></font></div> <div align="left"><font face="Arial" size="2"><span style=" font-size:10pt">It took me two years to develop Perl scripts that convert from Logos XML to OSIS for Bibles, Commentaries, GenBooks, and Dictionaries. </span></font></div> <div align="left"><font face="Arial" size="2"><span style=" font-size:10pt"><br> </span></font></div> <div align="left"><font face="Arial" size="2"><span style=" font-size:10pt">For example, even though Logos XML is well-structured, my converter for Bibles is customized to the three Bible texts that it converted, and to use it for other Bibles will require further customization for each. For Commentaries and GenBooks it handles them in a more generic way without need for further customization.</span></font></div> <div align="left"><font face="Arial" size="2"><span style=" font-size:10pt"><br> </span></font></div> <div align="left"><font face="Arial" size="2"><span style=" font-size:10pt">OSIS is mainly a semantic markup scheme, highly adapted to Scripture, but little else. Since html is a totally flexible structure, you need a way to map the structural elements in your source to structural elements in OSIS. It has very limited formatting capabilities. You need to have a way to deal with CSS. Rendering is mostly left up to the Client User Interface.</span></font></div> <div align="left"><font face="Arial" size="2"><span style=" font-size:10pt"><br> </span></font></div> <div align="left"><font face="Arial" size="2"><span style=" font-size:10pt">I wish I had an html to OSIS converter to offer you, but maybe somebody else has come up with a method that is straight-forward.</span></font></div> <div align="left"><font face="Arial" size="2"><span style=" font-size:10pt"><br> </span></font></div> <div align="left"><font face="Arial" size="2"><span style=" font-size:10pt">John</span></font></div> <div align="left"><font face="Arial" size="2"><span style=" font-size:10pt"><br> </span></font></div> <div align="left"><font face="Arial" color="#7f0000" size="2"><span style=" font-size:10pt">> Hello,</span></font></div> <div align="left"><font face="Arial" color="#7f0000" size="2"><span style=" font-size:10pt">> All is in the title, someone have a Linux tool to convert html files to</span></font></div> <div align="left"><font face="Arial" color="#7f0000" size="2"><span style=" font-size:10pt">> osis?</span></font></div> <div align="left"><font face="Arial" color="#7f0000" size="2"><span style=" font-size:10pt">> In this case it is for the KD module. I download the html source files</span></font></div> <div align="left"><font face="Arial" color="#7f0000" size="2"><span style=" font-size:10pt">> but I want not to work a lot on it. First I will work on bible issues</span></font></div> <div align="left"><font face="Arial" color="#7f0000" size="2"><span style=" font-size:10pt">> and not commentary. But if someone have a tool to do quickly the job...</span></font></div> <div align="left"><font face="Arial" color="#7f0000" size="2"><span style=" font-size:10pt"><br> </span></font></div> <div align="left"><font face="Arial" color="#7f0000" size="2"><span style=" font-size:10pt">John Dudeck</span></font></div> <div align="left"><font face="Arial" color="#7f0000" size="2"><span style=" font-size:10pt">Programmer at Editions Cle Lyon, France</span></font></div> <div align="left"><font face="Arial" color="#7f0000" size="2"><span style=" font-size:10pt">john.dudeck@sim.org john@editionscle.com</span></font></div> <div align="left"><font face="Arial" color="#7f0000" size="2"><span style=" font-size:10pt">--</span></font></div> <div align="left"><font face="Arial" color="#7f0000" size="2"><span style=" font-size:10pt">"All programmers are optimists." -- Frederick Brooks</span></font></div> <div align="left"> </div> </blockquote><div><br></div><div><br></div>