<HTML><BODY style="word-wrap: break-word; -khtml-nbsp-mode: space; -khtml-line-break: after-white-space; ">Greg,<DIV><BR class="khtml-block-placeholder"><DIV>Using an XML parser is actually quite viable to do the parsing for osis2mod. The fundamental behavior of the program is to identify and gather all the "chunks" that need to go into the index and then call the Sword API routines to store a chunk against a key.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>The Sword API keeps track of all the offsets and the size of the data as it goes. It does not have any memory of what it has done, but only knows the current size of the output file (via tell, IIRC) and the size of what it is writing. This info is written to the index file in the slot reserved for that verse.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>The key is called a verse, but it might be an intro to a testament, book or chapter. The other main trick in osis2mod is the identification of headings and their placement into the verse that follows. osis2mod also does some normalization of the input.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>All of this can be readily done with Xerces as the parser, using either SAX, DTM or DOM and even by using XSLT.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>There are drawbacks:</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>It requires a new skill set to maintain osis2mod. Several developers currently maintain it. Though I have been told it's mine since I touched it last ;)</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>It requires well-formed input. The current parser does not, but does warn when input is not.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>The current program works well. The new program would need extensive certification. Or both would need to exist until we are satisfied with the replacement.</DIV><DIV><BR class="khtml-block-placeholder"></DIV><DIV>To me the biggest motivation for a rewrite would be to handle other kinds of modules besides Bibles.</DIV><DIV><SPAN class="Apple-tab-span" style="white-space:pre">        </SPAN></DIV><DIV>DM</DIV><DIV><BR><DIV><DIV>On Mar 9, 2007, at 4:04 PM, Greg Hellings wrote:</DIV><BR class="Apple-interchange-newline"><BLOCKQUOTE type="cite"><P style="margin: 0.0px 0.0px 0.0px 0.0px"><FONT face="Helvetica" size="3" style="font: 12.0px Helvetica">When I asked about this question in the past, specifically related to</FONT></P> <P style="margin: 0.0px 0.0px 0.0px 0.0px"><FONT face="Helvetica" size="3" style="font: 12.0px Helvetica">the utilities as you are, is when I finally received my insight into</FONT></P> <P style="margin: 0.0px 0.0px 0.0px 0.0px"><FONT face="Helvetica" size="3" style="font: 12.0px Helvetica">how the Sword library holds its files.<SPAN class="Apple-converted-space"> </SPAN>Due to he fact that most XML</FONT></P> <P style="margin: 0.0px 0.0px 0.0px 0.0px"><FONT face="Helvetica" size="3" style="font: 12.0px Helvetica">parsers obfuscate the actual number of bytes that have been read, and</FONT></P> <P style="margin: 0.0px 0.0px 0.0px 0.0px"><FONT face="Helvetica" size="3" style="font: 12.0px Helvetica">since the Sword library generates an index file for the module that</FONT></P> <P style="margin: 0.0px 0.0px 0.0px 0.0px"><FONT face="Helvetica" size="3" style="font: 12.0px Helvetica">relies on the number of bytes into the data file a certain occurrence</FONT></P> <P style="margin: 0.0px 0.0px 0.0px 0.0px"><FONT face="Helvetica" size="3" style="font: 12.0px Helvetica">is located, using a DOM or SAX parser, I was told, is not viable.</FONT></P> </BLOCKQUOTE></DIV><BR></DIV></DIV></BODY></HTML>